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ABSTRACT 



A meta search system accepts natural language queries 
which are parsed to extract relevant content, this relevant 
content being formed into queries suitable for each of a 
selected number of search engines and being transmitted 
thereto. The results from the search engines are received and 
examined and a selected number of the information sources 
represented therein arc obtained. These obtained informa- 
tion sources arc then examined to rank their relevance to the 
extracted relevant content and the portions of interest in each 
of these ranked information sources are determined. The 
determined portions are output to the user in ranked order, 
having first been processed to clean up the portions to 
include valid formatting and complete paragraphs and/or 
sentences. 

13 Claims, 16 Drawing Sheets 
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NATURAL LANGUAGE META-SEARCH on multiple search engines thus exacerbating the number of 

SYSTEM AND METHOD matches which must be reviewed by the user. The use of 

more than one search engine can also require the user to 

FIELD OF THE INVENTION redraft his search data to accommodate different search data 

™ . , , ,.,.5 requirements and/or capabilities of the different search 

The present mvention relates to a system and method of • i- i t. • in 

. . r • r • X * 1, . engmes. For example, some search engines may only allow 

processing queries for mformat.on. More specifically, the ke^^rd-based searches whUe others may permit searching 

present mvention relates to a meta-search system and based u on hrases 

method for accepting a natural language query which is y y 

processed to retrieve information from one or more irifor- ^hese difficulUes often result in the less skiUed user not 
mation sources via at least one search engine and to extract ^° obtaimng acceptable search results without multiple and/or 

relevant portions of those information sources for output to recursive search attempts, which has led many users to adopt 

the originator of the query. mteractive search techmque commonly referred to as, 

"surfing the web" which, while often entertaining and/or 

BACKGROUND OF THE INVENTION informative, can be time consuming and may still not locate 

^5 the desired information. 

Systems and methods for locating information in data- ^^tural Language Query (NLQ) systems are also known 

bases are known. An area in which such systems and a variety of purposes. Generally, a NLQ 

methods have recently become quite common and heavily system accepts a search sentence or phrase in common 

y,™^, searching for information on the World Wide Web everyday (natural) language and parses the input sentence or 

(WWW) and/or on other internet sources. 20 ^^^^^^ ^^^^^^^ ^^^^^^ mcming from it. For 

Typically, an internet user will access a search engine, example, a natural language search phrase used with a 
such as AltaVista or Yahoo through a web page maintained company's financial database may be "Give me a list of the 
for that purpose by the host of the search engine and will fourth quarter general ledger expense accounts," This sen- 
input search data relating to the information sought into the tence will be processed by the NLQ system to determine the 
search engine. The search data can, for example, comprise information required by the user which is then retrieved 
keywords or phrases related to the information sought and from the financial database as necessary. However, such 
boolean operators to further qualify the search. Examples of NLQ systems are computationally expensive to operate as 
such search data are, "AZT and Toxicity", wherein AZT is the processing required to determine the meaning of a 
one keyword. Toxicity is another and the 'and' is boolean sentence or phrase is significant. Further, such systems are 
operator requiring both keywords to be present in the generally limited in terms of the scope of the information 
information source for it to be considered a match, which they can access. For example, a different NLQ system 

Once search data is input, the search engine then consuhs is likely required to correctly process queries relating to a 

one or more indices it maintains of web pages or other company's financial information than is required to search a 

information sources that match the search data. A listing of medical database of obscure diseases. Also, such NLQ 

the information sources that match the search data, often systems generally only produce acceptable results with well 

referred to as "hits", is then displayed to the user, the number defined and/or homogeneous databases, 

of matches usually being limited to some predefined maxi- It is desired to have a meta-search engine which will 

mum number. These matches are typically ranked, usually accept natural language search data to search for information 

according to the number of occurrences of keywords or from one or more information sources which need not be 

phrases in the information source. Generally, the infonma- homogeneous or well defined, the meta-search engine would 

tion which is displayed to the user for each match comprises identify portions of the matching information which it 

a location at which the document can be accessed (a URL for determines to be relevant to the search data and would 

a WWW document) and some minimal additional informa- display at least those determined portions to the user, 
tion such as a document title, etc. 

Generally, such search engines provide a skilled user with SUMMARY OF THE INVENTION 

reasonable results from well defined and/or homogeneous ^^-^^^ ^^^^^ invention to provide a novel 

x'^o ffj" 1 information sources. For example the 4ta4earch system and method for obtaining information 

APS U.S. Patent database can be efficienUy searched based ^^^^^ ^ ^^^^^^j ^ ^^^^ ^ l^^^j. 

on the contents of well-defined information fields such as information sources which obviates or mitigates at least one 

Patent Number, inventor Name, etc. to locate mformation disadvantage of the prior art. 

sought. 11 / >>/ccording to a first aspect of the present invention, there 

However, while such search engines can generaUy pro- ^ /^^^^^ ^ ^^^^^ ^^^^^ information in at least one 



information source, comprising the steps of: 



vide a skilled user with reasonable results from such well 
defined and/or homogeneous databases, they do suffer from 55 
disadvantages. Specifically, when searching databases or 
information sources which are not homogeneous or well 
defined, such as the WWW and/or internet, even the best 
formed search strategy can result in a hundred or more 
matches, many of which are not useful to the user but which go (iii) creating search data from said extracted terms in an 



(i) accepting a natural language query describing desired 

information: 

I / . • 

j(ii) parsing said natural language query to extract terms 
relevant to said desired information; 



must still be reviewed by the user, to at least some extent, to 
determine this. Further, such search engines generally 
require the user to understand and be comfortable with 
boolean type searches and are limited to this type of search 



form appropriate to each of said at least one search engines 
and transferring said created search data thereto to initiating 

a search; 

1 

(iv) receiving results comprising at least a list of infor- 



operation. 65 niation sources from each of said at least one search engines 

To enhance the chances that the desired information will \ and removing redundancies 
in fact be located, a user will often perform the same search Mist^of information sources; 
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/ (v) retrieving complete copies of each information source FIG. 7 shows a schematic representation of an HTML 

in said reduced list; clean up step in the selector of FIGS. 6, 6a, 6b and 6c; and 

(vi) examining each said retrieved complete copy relative ' FIG. 8 shows a schematic representation of a text clean up 
to said extracted terms to determine a match ranking therefor step in the selector of FIGS, 6, 6a, 6b and 6c. 

and to identify relevant portions of said information source; 5 _ _ 

.nA DETAILED DESCRIPTION OF THE 

INVENTION 

(vii) providing said identified relevant portions to said 

user in order of said determined rankings. pFIG. 1 shows a meta-search system 20 in accordance with 

Preferably, at least two search engines are employed. Also embodiment of the present invention. As used herein, the 

preferably, the extraction of relevant terms by the natural "meta-search" system and/or method is intended to 

1 language parser includes adding terms which are alternatives / comprise a search system and/or method which acts between 

and/or synonyms to terms directly extracted from the natural ^ ^ one or more search engines. As described below, 

language query. Also preferably, the relevant portions pro- thejmeta-search system can accept a natural language query, 

vided to the user are at least complete paragraphs of infor- ^^^^act relevant terms and/or phrases from that query to 

jjj^^Iqjj produce search queries appropriate to each of one or more 

"\ J. , . r * • search engines. The meta-search system has one or more of 

Accordmg to another aspect of the present mvention, ^^^^^ ^^f^^ ^ ^^^^^^ ^ J^^^^ .^^ 

there is provided a meta-search system comprising: ^^^i^.^^ mefa-search system with a list of 'hits^ The 

a namral language query processor to produce a set of | meta-search engine accumulates these hits and examines 

relevant terms from a natural language query; 20 t^em to remove redundancies. A copy of the complete 

a meta-search engine means to communicate with said at 1 information source is retrieved for a pre-selected number of 

least one search engine, to form from said relevant terms a \ the non-redundant hits and these copies are examined by the 

search data set for each said at least one search engine which meta-search engine to determine a ranking for each infor- 

is in a format defined for said at least one search engine and I mation source and to determine the portions of the infor- 

to receive search results from said at least one search engine; 25 \ mation source which relate to the extracted relevant terms. 

filter means to remove redundancies from said received IThese portions are output to the user, in ranked order, as the 

search results to produce a reduced list of identified infor- Results of the search. 

mation sources; j As shown in FIG. 1, system 20 includes a Natural 

information retrieval means to retrieve said identified Language Query Processor 24 which is operable to receive 

information sources; 'Natural Language Search Data 28 and to extract relevant 

selection means 'to examine each information source Iterms and/or phrases therefrom. Specifically, search data 28 

retrieved by said information retrieval means and to rank /can^comprise one or more complete or incomplete sentences 

each said information .source relative to said set of relevant [ which processor 24 parses. 

terms and to identify portions of said each said information 35 Referring to FIG. 2, the parsing process 100 employed by 

source relevant to said extracted terms; and ' processor 24 is shown. At step 104, search data 28 is 

output means to provide said user with said identified accepted and processed to remove punctuation. At step 108, 

portions in order of said ranking. ^ g^f^P^ (words and/or phrases) are classified according to a 

Preferably, at least two search engines are employed. Also Preselected classification scheme^ Next, the classified 

preferably, the extraction of relevant terms by the natural 40 g^«VP" are manipulated at^step 112 to obtam a hs of 

language query processor includes adding terms which are [extracted relevant terrns and this kst is expanded, at step 116, 

alternafives and/or synonyms to terms directly extracted ^"^'^ groups of less common phrases into more com- 

from the natural language query. Also preferably, the iden- P rases. 

tified portions output to the user are at least complete Specifically, at step 104 search data 28 is examined to 

paragraphs of information. 45 remove all trailing punctuation, such as "?", and 

including any of these appearing before a closing single or 
BRIEF DESCRIPTION OF THE DRAWINGS double quotation mark. Next, all commas, colons, semi- 
Preferred embodiments of the present invention will now '"'^ any "abandoned" Punctuat^n 
be described, by way of example only, with reference to the ^.f"^^ ^pac^s returns or linefee^ on either or both 
attached Figures, wherein: 50 sides, is removed. An example of abandoned puncmation is 
„^ . , , . . ^ .the hyphen in "take a break — today . Processing then pro- 
HG. 1 shows a schematic representation of a meta-search ^^^^ ^^j^^ described below, with reference 
system in accordance with the present invention; to FIG 3 

no. 2 shows a schematic representation of a natural ' 3*ijiu3trates sub-steps of step 108 wherein, at step 

language query processor in accordance with an embodi- 20O, each group between quotation marks is classified as a 

mem of the present mvention; ^^^^^q corresponding quotation marks are then 

no. 3 shows a schematic representation of a classifica- removed from search data 28, i.e.— "grand canyon" is 

tion step of the natural language query processor of FIG. 2; classified as quote(grand canyon). 

FIGS. 4fl through 4e show schematic representations of a /^^ step 204, a comparison is performed between the' 

manipulation step of the natural language query processor of processed search data 28 and a list of null content phrases, 

2; referred to by the present inventor as "throw away phrases". 

FIGS. 5, 5a and 5b show schematic representations of a Each match between a group in processed search data 28 

meta search engine in accordance with an embodiment of (other than groups classified as quoteQ) and the list of null 

the present invention; content phrases is classified as a throwQ. Example lists of 

FIGS. 6, 6a, 6b and 6c show schematic representations of 65 null content phrases and null content words, in accordance 

a selector in accordance with an embodiment of the present with an embodiment of the present invention, are included 

invention; herewith as Tables 1 and 2 respectively in Appendix A. 
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Next, at step 208, an "or" expansion is performed if 
required. An "or" expansion is intended to convert phrases 
such as "big/huge/jumbo" into distinct terms separated by 
or*s, i.e. — ^"big or huge or jumbo". 

Next, each word in processed search data 28 which has 
not been classified as being part of a quoteQ or a throwQ is 
examined and categorized. An example of a set of categories 
used in a present embodiment of the invention includes: 
quoteQ, throwQ, capitalQ, numberQ, joinQ, prepQ, adjectQ, 
qwordQ, orQ, ranklQ and phraseQ, of which quoteQ and 
throwQ are discussed above and the remainder of which are 
described below. Classification proceeds in the order given 
above, with classification of groups as capitalQ's being 
considered before numberO's, etc. 

At step 212 each remaining unclassified word is examined 
to determine if it is within the definition of the capitalQ 
category. Specifically, if the first character of the word is 
capitalized, the word is classified as a capitalQ. Adjacent 
words which have been classified as capitalQ *s are com- 
bined into groups which are then classified as capitalQ, 
i.e. — capital(Mickey) capital(Mouse) are combined to capi- 
tal (Mickey Mouse). 

At step 216, each remaining unclassified word is exam- 
ined to determine if it is within the definition of the numberQ 
category. Specifically, if the first character of the word is a 
number, the word is classified as a numberQ- 

At step 220, each remaining unclassified word is exam- 
ined to determine if it is within the definition of the joinQ 
category. Specifically, the word is compared to a predefined 
list of words and, if the word is present in the list, the word 
is classified as a joinQ. An example of a list of words which 
are used for classifying joinQ's in accordance with an 
embodiment of the present invention is included herewith as 
Table 3 in Appendix A. 

At step 224, each remaining unclassified word is exam- 
ined to determine if it is within the definition of the prepQ 
category. Specifically, the word is compared to a predefined 
list of words and, if the word is present in the list, the word 
is classified as a prepQ. An example of a list of words which 
are used for classifying prepQ^s in accordance with an 
embodiment of the present invention is included herewith as 
Table 4 in Appendix A. 

At step 228, each remaining unclassified word is exam- 
ined to determine if it is within the definition of the adjectQ 
category. Specifically, the word is compared to a predefined 
list of words and, if the word is present in the list, the word 
is classified as a adjectQ. An example of a list of words 
which are used for classifying adjectQ's in accordance with 
an embodiment of the present invention is included herewith 
as Table 5 in Appendix A. 

At step 232, each remaining unclassified word is exam- 
ined to determine if it is within the definition of the qwordQ 
category. Specifically, the word is compared to a predefined 
list of words and, if the word is present in the list, the word 
is classified as a qwordQ. An example of a list of words 
which arc used for classifying qwordQ* s in accordance with 
an embodiment of the present invention is included herewith 
as Table 6 in Appendix A. 

At step 236, each remaining unclassified word is then 
deemed to be a phraseQ. Adjacent words in processed search 
data 28 which are categorized as phraseQ* s are combined to 
form phrases which are then categorized as phraseQ. 

Finally, at step 240, the first word of each classified 
quoteQ is examined to determine if it is capitalized. If it is, 
it is converted to lowercase and it is compared to the 
respective lists to determine if it can be classified as a 
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throwQ, prepQ or joinQ. If it can be, it is removed from the 
quoteQ and re -classified accordingly. A similar process if 
performed for the first word of each classified capitalQ. 

The next step of parsing process 100 is step 112, in FIG. 
5 2, wherein the classified words and/or phrases are manipu- 
lated to extract the most relevant terms therefrom. Step 112 
is described with reference to FIGS. 4a through 4e, which 
illustrate sub-steps of step 112. Specifically, at step 250, a 
check is first performed to ensure that search data 28 
10 contains groups (either a word or phrase) which has been 
classified as other than throwQ. If all groups in search data 
28 are classified as throwQ, an error message is presented to 
the user instructing them to rewrite their search data at step 
254. Otherwise, all groups in search data 28 which have 
15 been classified as throwQ*s are discarded at step 258. 

Next, a determination is made at step 262 as to whether 
the first remaining group in search data 28 is classified as 
phraseQ. If the first remaining group is classified phraseQ, 
then a determination is made at step 266 as to whether any 
^0 group exists in search data 28 which has been classified as 
capitalQ or quoteQ and which is not immediately preceded 
with a group classified as prepQ or joinQ. If one or more 
such groups are present in search data 28, the first such 
group's classification is changed at step 270 to ranklQ. If, at 
step 266, it is determined that no such group exists in search 
data 28, the classification of the first group is changed from 
phraseQ to ranklQ at step 274. 

If, at step 262, the first remaining group is not classified 
phraseQ then a determination is made at step 278 as to 
whether the first remaining group in search data 28 is 
classified numberQ, adjectQ, or qwordQ. If the first remain- 
ing group is one of these classifications, a determination is 
made at step 282 as to whether any group exists in search 
data 28 which has been classified as capitalQ or quoteQ and 
which is not immediately preceded with a group classified as 
prepQ or joinQ. If one or more such groups are present in 
search data 28, the first such group's classification is 
changed, at step 286, to ranklQ. 

If, at step 282, no such group classified as capitalQ or 
quoteQ exists, a determination is made at step 290 as to 
whether there is any remaining group in search data 28 
which is classified phraseQ. If there is at least one such 
group, the classification of the first of these groups is 
changed to ranklQ at step 294. 

If, at step 290, there is no such group then a determination 
is made, at step 298, as to whether there is a remaining group 
in search data 28 which is classified numberQ or adjectQ. If 
there is at least one such group, the classification of the first 
5Q of these groups is changed to ranklQ at step 302, 

If, at step 298, there is no such group, the first remaining 
group in search data 28, which was classified qwordQ, is 
changed at step 306 to a ranklQ classification. 

If, at step 278, it was detiermined that the first remaining 
55 group in search data 28 was not classified numberQ, adjectQ 
or qwordQ, then a determination is made at step 310 (in FIG. 
46) as to whether the first remaining group is classified as 
capitalQ or quoteQ, If the first remaining group is classified 
as capitalQ or quoteQ, it is changed to a classification of 
60 ranklQ at step 314. 

If, at step 310, the first remaining group is not classified 
as capitalQ or quoteQ, then a determination is made at step 
318 as to whether the first remaining group is classified as 
prepQ. If it is, then at step 322 a determination is made as 
65 to whether any group exists in search data 28 which has been 
classified as capitalQ or quoteQ and which is not immedi- 
ately preceded with a group classified as prepQ or joinQ. If 
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one or more such groups are present in search data 28, the At step 402, apostrophe s*s (*s) are deleted, if present, 

first such group's classification is changed at step 326 to from each non-joinQ group. Next, at step 406, the first 

ranklQ- remaining group which is not classified as a joinQ is 

If, at step 322, it is determined that no group classified as examined. A determination is made at step 410 as to whether 
capitalO or quoteQ exists in search data 28 that is not 5 the group which immediately precedes this group is a 

immediately preceded by a group classified prepQ or joihQ, join(or). The term "join(or)" refers to the word *or', from 

then at step 330 a determination is made as to whether any Table 3 of Appendix A, which wiU have been classified as a 

group exists in search data 28 which has been classified as joinQ. If the condition at step 410 is true, then at step 414 

phraseO which is not immediately preceded with a group ^^her joints which immediately precede the join(or) 

classified as prepQ or joinQ. If one or more such groups are removed J v / 

present in search data 28, the first such group's classification vr^w o* /tio ■ *■ ■ j * u *u 

is changed at step 334 to ranklQ. ^^""'^ TF ' made as to whether 

If . « «n-* ' 1* • Y«u « ^ r J the immediately preceding non-jomO IS an orO. Ad OfQ IS a 

If, at step 330, U is determined that no group classified as ^i.rc.;r.„«#-«« r / r#f liV l uu u 

phraseO exists in search data 28 that is not immediately tT^TT^""'; *^'Ap T 7 Tf. 

preceded by a group classified prepQ or joinQ, then at step ^P^^^^^.^^.^^ Por example, search data, "Aor 

338 a determination is made as to whether any group exists B or C is re-expressed as or(A B C) for efficiency and 

in search data 28 which has been classified as numberQ or convenience reasons. If, at step 418 the immediately pre- 

adjectO and which is not immediately preceded with a group ceding non-joinQ is an orQ, then the preceding orQ, the 

classified as prepQ or joinQ. If one or more such groups are J^"' W and the non-jomQ groups are combined, 

present in search data 28, the first such group's classification ^^^P ^^"^ precedmg Non-joinQ is not an orQ. 

is changed at step 342 to ranklQ processing continues at step 426 wherein the preceding 

If, at step 338, it is determined that no group classified as ^^"-^^^0, the join(or) and the non-joinQ group are com- 

numberQ or adjectQ exists in search data 28 that is not ^"V 1 r . a^o a-^'^ a-^^ - 1. 

• J^„^a1a 1 1 « *c J A -A As an example of steps 418, 422 and 426, given search 

immediately preceded by a group classified prepQ or ioin(), . • . . , , 1 , ^ f /r- ^ 

then at step 346 a determination is made as to whether any data which has been classified as 'phrase(pass) prep(from 
group exists in search data 28 other than groups classified " cap.lalO inkers) join(or) capital(tvert) join(or) capital 

^ A ■ ■ A Tf u *u . . ^ . , ^ (Chance) , when group capital(Evert) is processed at step 

prepOorjoinO. If no such other groups remain in search data , ' • f, . a-^/ ^ . 

'io .u * J* *u * » 1CA 418, the processing will proceed to step 426. At step 426, the 

28, then an error message IS presented to the user at step 350. . j * • l - j j ^ / \ \ 

Tir u .u -J • r n * r * A search data is combined to read, * phrase(pass) prep(from) 

If such other groups do exist, then the first group of quoteQ, , wt^- i x • • • / i 

•* lA u A J- *A J 1 n *• • or(capital(Tinkers), capital(Evert)) loin(or) 

capitalQ, numberf), adjectQ, or qword classification is \, J™ \» xt / l • u i 

changed to ranklQ at step 354. capital(Chance) . Ne^t when processing group ' capital 

rr -y^^ t n • . , ^ (Chance) at step 418, the processing will proceed to step 

If. at step 318, the flm group is not classified as prepO. 422 therein the search data is combined to read, "phrase 

then at step 358 (FIG. 4c) a determination is made as to ^^^^ prep(from) or<capital(Tinkers), capital(Evert), capital 

whether the first remammg group is classified as a joinQ. If (Chance)) 

it is, this group is deleted from search data 28 at step 362 and 35 ^ determination is made as to whether all 

processing reverts to step 262. non-joinQ groups have been considered and, if not, the next 

At step 366 (FIG. 4d), the first remaming group m search g^o^^p ^ s^i^^ted at step 432 and processing reverts to step 

data 28 is selected for examination. At step 370, a determi- 405. jf^ at step 428, it is determined that all non-joinQ groups 

nation ^ made as to whether the group is a phraseQ, have been considered, the manipulation process is complete, 

numberQ, capitalQ or quoteQ classification and whether it is ^s indicated at 436 

immediately preceded by a group which is a joinQ. If these Qnce the manipulation of the classified words at step 112 

conditions are met by the group being examined and if the i^^otnplete, step 116 of parsing process 100 is performed to 

joinQ which precedes the group is in turn preceded by a tomlslete the process. Specifically, in step 116 an examina- 

group classified as ranklQ, then at step 374 the classification [^^^ performed on each remaining group in search data 28 
of the group IS changed to also be ranklQ, i.e.— rankl(IBM) 45 determine groups which can advantageously be translated 

join(and) phrase(compiler) becomes rankl(IBM) join(and) 1^^^^/^,^ enhanced. A translation table (not shown) of words 

rankl(compiler). 1 phrases and their preferred alternatives is maintained by 

If the conditions of step 370 are not met by the group, at {process 100 and the remaining groups in search data 28 are 



step 378 a determination is made as to whether the group is 
an adjectQ which is immediately preceded by a joinQ and, 50 
if so, if the group which immediately precedes that joinQ is 
classified as ranklQ. If these conditions are met, then the 
first following group which was classified phraseQ, number 
0, capitalQ or quoteQ is changed to ranklQ at step 382. 

If the conditions of step 378 are not met, at step 386 a 55 
determination is made as to whether the group is classified 



compared to the entries in this table. For each match, the 
matching group is replaced with the preferred alternative, 
either explicitly or via a translation function. 

For example, the translation table can contain an explicit 
entry for "get in touch" for which a preferred alternative can 
be "^contact". Any group in search data 28 which contains the 
phrase "get in touch" will have this phrase replaced by 
contact". As another example, the translation table can 



as a phraseQ, numberQ, . capitalQ or quoteQ and if it is i contain a function to convert time -related words into 



nunocric equivalents. Specifically, any group in search data 
28 containing the word "today" will have this word replaced 
with the current date in an appropriate format, such as 
dd/mm/yy. Similarly, whole numbers can be converted to 



immediately followed by a group classified as joinQ which 
is in turn immediately followed by a group, which is clas- 
sified as ranklQ- If these conditions are met, then the eo 
classification of the group is changed to ranklQ at step 390. 

At step 394, a determination is made as to whether all of | text| form, i.e. "7" converted to "seven 

the groups in search data 28 have been considered. If not, the j Finally, step 116 can perform a synonym expansion for 

next group is selected for consideration at step 398 and | selected words and/or phrases. For example, the word "dis- 

processing returns to step 370. Once, at step 394, it is 65 cover" can be expanded to "discover or invent or find", 

determined that all remaining groups have been considered, ! Referring again to FIG. 1, Natural Language Query 

processing continues at step 402 (FIG. 4e). i^Prqcessor 24 passes the processed search data 28' to meta 
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search engine 32. Meta search engine 32 receives processed at step 520, search data 28 is simplified for such engines, 

search data 28 and further processes it to place it into forms Specifically, as shown in FIG. Sb, at step 600 the groups in 

suitable for the search engine or engines 36 which are search data 28 are sorted by classification, with the presently 

defined for the information sources to be searched. For preferred sort order being ranklQ, orQ, capitalQ, quoteQ, 
example, if the information sources to be searched are 5 phraseQ, adjectQ and numberQ. Next, at step 604, the 

\\^WW pages, search engines 36 can be appropriate search contents of all of the groups are examined to remove 

engines such as Lycos, AltaVista, etc. Or, if a commercial duplicate words in a group, or between groups, 

database is to be searched, such as Lexis, search engines 36 At step 608, a number n is determined as being the 

can be the database's proprietary search engine. In any case, number of words remaining in search data 28, if less than 
meta search engme 32 is responsible for assembling queries lo four, or the value logg (number of words— 3). Next, at step 

which are appropriate to each search engine 26 from pro- 612, a determination is made as to whether the selected 

cessed search data 28. search engine accepts an input representing the "number of 

In a present embodiment of meta search engine 32, words to be matched" to have a 'hit*. If the engine does 

queries are assembled for three search engines 36, specifi- support this input, as determined from the information in 
caliy the Alta\^sta, Lycos and Excite search engines for ^5 database 512, then at step 616 the query is composed and 

WWW pages. As will be apparent, fewer or more search consists of all of the words and n. If, at step 612, the engine 

engines can be employed if desired. It is also contemplated does not support a "number of words to be matched" input 

that different sets of search engines can be employed for then at step 620 the query is composed and comprises the 

different subject matters. For example, general enquiries first n words. 

ma^y be passed to the set of three search engines mentioned 20 j^^^e other search engines, such as Lexis, etc. are 
above, while an enquiry relating to legal issues may be sent included in the set of search engines, at step 524, search data 
jto any two of these search engines and to the Lexis database. 28 is appropriately simplified for such engines as will be 
\l\ is contemplated that the selection of an appropriate set of apparent, to those of skill in the art, in view of the above, 
search engines can either be performed explicitly by the Referring again to FIG. 1, the simplified queries 38 from 
user, or implicitly by the search system 20, based upon 25 ^eta-search engine 32 are dispatched to the corresponding 
recognized keywords in the processed search data 28 or search engines 36 via suitable transmission means. For 
other mformation such as the user's identity, location, etc. example, if a search engine is accessible from a web page on 
As shown in FIGS. 5, 5a and 5b, at step 500 a set of search the internet, the query is sent to the URL for that web page 
engines is selected. As mentioned above, this can be either with the query being in the required format. As will be 
an implicit selection (a default set) or an explicit selection apparent to those of skill in the art, the present invention is 
made by the user or by the search system 20. Next, at step not limited to internet and/or World Wide Web-based search 
504, search data 28 is examined and all groups classified as engines and any accessible search engine can be employed. 
qwordO are removed from the processed search data 28. Examples of such search engines include, but are not 
Next, at step 512, a database of search engine capabilities, limited to, those accessible via a LAN, a dedicated telecom- 
requirements and addresses (URL's or other appropriate munications line, a dial-up telecommunications link, etc., or 
address information) is consulted to determine the appro- even one or more search engines integral with system 20 can 
priate parameters for each search engine in the selected set all be employed with the present system, 
of search engines. pAt step 532 in FIG. 5, ^ hits' 42 (in FIG. 1) from each 
If one or more boolean-type search engines such as search engine are received by meta -search engine 32. These 
Excite, AltaVista, etc. are included in the set of search jhits are then passed to Search Results Filter 46 when results 
engines, at step 516, search data 28 is simplified for such lhave been obtained from all of the search engines in the set 
engines. |or when a predetermined time limit has been exceeded 
FIG. 5a shows a simplification for such boolean engines / without receiving resuhs from one or more search engine, 
wherein at step 550, the groups in search data 28 are sorted 45 The hits received by Search Results Filter 46 are generally 



by classification, with the presently preferred sort order 
being ranklQ, orQ, capitalQ, quoteQ, phraseQ, adjectQ and 
numberQ. At step 554, each orQ group is changed to the 
syntax required by the search engine, for example or(phrase 
(a), capital(b), phrase(c)) can be converted to (a or b or c). j^. 
At step 558, the first portion of the query for the boolean 
search engine is formed by combining all of the groups 
which were classified as ranklQ, separated by AND'S. 



in the form of an address, such as a URL, at which a relevant 
information source can be located and the identity of the 
search engine which returned the hit. Search Results Filter 
46 combines the hits from each search engine into a single 
list and removes redundancies. The culled fist of hits is 
placed into the format necessary to retrieve the individual 
information sources and this formatted list is transferred to 
Information Retrieval means 50. 



At step 562, a determination is made as to whether the From this formatted list, Information Retrieval means 50 

next remaining group is classified as capitalQ or quoteQ and, 55 retrieves the complete information sources 54 for each of a 

if it is, that group is added lo the query with an AND at step preselected maximum number of hits from each search 

566. If, at step 562, the next group is not a capitalQ or engine 36. For example, the first 10 hits from each engine, 

quoteQ, at step 570 multiple word phrases are split into after redundancies have been removed, may be retrieved, 

individual words and combined with OR's and the resulting The retrieved information sources are then examined by 

structure is add to the query with an AND. Next, at step 574, go the Selector means 58. Selector means 58 performs several 

all orQ's are added to the query with an AND and, at step functions, including ranking the relevancy of the informa- 

578, all remaining unique words in the search data are tion sources retrieved and identifying their relevant portions 

combined into a structure, wherein each word is separated for output to the user. 

by an OR, and the resulting structure is added to the query The process for ranking of the information sources 

with an AND. ^5 employs the processed search data 28 from Natural Lan- 

If one or more "word-only" type search engines such as guage Query Processor 24. Specifically, as illustrated at step 

Lycos, HotBot, etc. are included in the set of search engines, 680 of FIG. 6, a scoring regime is established for the 
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retrieved information sources relative to the processed The result of this calculation is then multiplied by the 

search data 28 and a score table is created to hold determined value logjo (x)^"^, where x is the previously determined 

scores for each information source. A presently preferred length of the segment. This latter step weights the result 

scoring regime is given in Appendix B. In this regime, each against segments which are relatively small. Finally, the 
group in processed search data 28 is treated as a separate 5 result of this calculation is divided by the value l+log^o (y), 

candidate and separate totals are maintained for each can- where y is the difi[erence between the number of matches in 

didate in the score table. An example of processed search the candidate with the greatest number of matches and the 

data 28 which reads, "or(phrase(contact), phrase(personnel), average number of matches for the other candidates, how- 

phrase(names)); phrase(people); rankl(Gravis); rankl ever if the value of y is determined to be less than one, it is 
(Logitech))" has four candidates. lo set at one. This calculation is intended to weight the result 

At step 684, the processed search data 28 is augmented by against segments with a high number of matches in just a 

adding the following to processed search data 28: for each ^w candidates and few matches in the remaining candi- 

group with multiple word phrases, create another group dales. The result of all of these calculations is the resultant 

wherein the first word is capitalized, (i.e. — for phrase(big segment score. 

sky) create group phrase(Big sky)); for each group with ^5 a step 724, a determination is made as to whether all 

multiple word phrases, create another group wherein each matches in an information source have been considered. If 

word is capitalized, (i.e. — for phrase(big sky) create group unconsidered matches exist, the next three consecutive 

phrase(Big Sky)); for each group with multiple words, matches are selected for consideration as a segment at step 

including any capitalized groups created in the preceding 728. In the event that less than three unconsidered matches 
steps, another group is created by replacing spaces in the 20 exist, a segment of three is formed at step 728 by "padding", 

group with +'s (i.e. — for phrase(Mickey Mouse), create namely by taking the last three consecutive matches, even if 

phrase(Mickey+Mouse)); and for each word, whether in a one or two of these matches have previously been consid- 

single word group or a multi-word group, make new words ered. Processing then commences again at step 716, 

by capitalizing them. For example, the phrase(mickey If, at step 724, it is determined that all matches have been 
mouse pluto) becomes phrase(Mickey), phrase (Mouse) and 25 considered, the two segments with highest scores are 

phrase(Pluto). Each of these created groups is then added to selected at step 732. It will be apparent to those of skill- in 

the score table, with a score for any of these groups being the art that, in the event that only a single segment exists in 

considered a score for the candidate, i.e. — a match with the an information source, processing will proceed from step 

augmented phrase(Mickey mouse) is scored for the phrase 734 to step 764, described below. 

(mickey mouse). ^t step 740, as shown in FIG. 6b, the first of the two 

Next, at step 688, a first retrieved information source is highest scoring segments is selected. At step 744, the 

selected. At step 692, the information source is examined to selected segment is augmented by adding the immediately 

determine each match between its contents and the groups in preceding match (if any) to form an augmented segment. As 

the score table. For each match, an entry is made in the score referred to herein, a segment is merely a first offset from the 

table for the corresponding candidate including the score start of the information source defining the start location of 

assigned the match under the selected scoring regime and the portion of the information under consideration and a 

the location of the match within the information source. second offset defining the end of the portion of interest in the 

Next, at step 696, the matches are sorted by their location information source. Thus, in step 744, the augmentation is 

within the information source. At step 700, a determination accomplished by moving the first offset appropriately, 

is made as to whether more than three matches were found towards the start of the information source. Similarly, when 

within the information source. If three or fewer matches a segment is "scanned" or otherwise processed, the infor- 

were found, the information source is assigned a rank of zero mation source is actually being considered, between the two 

at step 704 and, if at step 706 it is determined that one or offsets. 

more information sources remain to be considered, the next Steps 716 and 720 are then performed again on this 

information source is selected at step 708 and processing augmented segment. At step 748, the selected segment is 

returns to step 692. augmented by adding the immediately following match (if 

If at step 700 it is determined that more than three matches any) to foran a second augmented segment and steps 716 and 

have been found in the information source, processing 720 are then performed again. 

proceeds to step 712, shown in FIG. 6aj wherein the first 50 At step 752, a determination is made as to whether the 

three consecutive matches are selected for further consider- resulting score of either of these augmented segments is 

ation. At step 716, a table is established with an initial score higher than the previous score for the segment. If at least one 

value for each candidate. An example of a table of presently score is higher, the augmented segment with the highest 

preferred initialization values is given in Appendix C. score is selected at step 756 and steps 744 through 752 are 
At step 720, the scores are determined for the set of three 55 performed again on the selected augmented segment, 

hits, referred to herein as a segment. Specifically, these wherein the selected augmented segment is augmented to 

segment scores are determined by adding the scores of the form two new augmented segments which are scored and 

corresponding candidates in each match with the initial compared to the score of the selected augmented segment, 

score value for each respective candidate, from Appendix C, This process of augmenting, scoring and comparing con- 
to obtain total scores for each candidate for the segment, 60 tinues until it is determined, at step 752, that neither of the 

These candidate totals are then multiplied together, includ- augmented segments have a score higher than the score of 

ing candidates which were not represented in the segment the previously selected segment. Once this is determined, 

and which thus only have their initial value. This value is the previous selected segment is deemed to be the resuh for 

then divided by the length of the segment (i.e. the number of the segment at step 760. A determination is made a step 762 
characters, including white space, etc. between the start of 65 as to whether the second highest scoring segment from step 

the first match being considered and end of the last match 732 has been considered and, if not, processing proceeds 

being considered). from step 744 for that segment. If both segments have been 
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considered, then at step 764 the segment, whether aug- Next, at step 924, the "cleaned up" segment is scanned 

mented or not, with the highest score is deemed to be the again, from the updated start to the updated end, to close any 

segment of interest for the information source. "open" tags (i.e.— an open tag for which there is no corre- 

A determination is made at step 768 as to whether any sponding closing tag, e.g. <CAPTION> without a 
other information sources remain for which a segment of 5 </CAPTION>) by adding the corresponding closing tag and 

interest has not been determined and, if this is the case, to open any "dangling" tags (i.e. -closing tags without a 

processing reverts to step 708. Otherwise, processing pro- corresponding open tag) by adding the corresponding open 

ceeds to step 800, as shown in FIG. 6c. tag. As will be apparent to those of skill in the art, added 

At step 800 the final segment from each information . closing tags will be added to the end of the segment, in 

source is ranked in descending order, by their respective reverse order to the order the corresponding open tags are 

determined scores. At this point, it is likely that these encountered in the segment and added open tags are added 

segments define portions of their respective retrieved infor- to the beginning of the segment, in reverse order to the order 

mation sources which are incomplete to some extent, such as corresponding closing tags are encountered in the seg- 

only being portions of paragraphs and/or sentences. Further, ^^^^ 

if the information sources were World Wide Web pages, in . .» • . 

TTTAji r * • Li *u * TTT-**T * Next, at step 928, problematic tags are modified or 

HTML format. It IS possible that one or more HTML tags are , . , . ,T ■ . ^ 

™- • f ™ *u J * *u LI T removed m accordance with the table in Appendix F. 

missing irom the portions, rendermg them unparsable by an .„ 

HTMLbrowser. Accordingly, at step 804, the final segments Specifically, the segment is checked for any filenames 

are "cleaned up". As this clean up process proceeds, the 20 ^'^^"^ "^'^^ ^''''^ ^ ^^"^ °' ^^^^"^ '^^'''^ 

information source retrieved is modified, if necessary, by expressed m with relative names, i.e.^ot with full univer- 

moving, adding or deleting information therein. ^al resource locators (URL^s). Any such filenames are 

Specifically, if the retrieved information sources are converted to absolute names, with full URUs. Tags listed in 

HTML formatted files, then the retrieved information source (2) of Appendix F are removed from the information source, 
is scanned, as indicated at step 900 in FIG. 7, to determine 25 along with their contents, and the segment start and end are 

if a <BODY> tag is present within the portion of the updated appropriately. Tags listed in (3) of Appendix F are 

retrieved information source which is between the start and removed from the information source, leaving their contents, 

end points defined by the segment. If no such tag is present. Finally, the specific tags listed in (4) of Appendix F are 

then at step 904, the retrieved information source is scanned, altered in the indicated manner. 

commencing at the start defined by the segment and working At step 932, each URL (hot Hnk) within the segment is 

towards the beginning of the retrieved information source, checked to confirm that it links to a vaHd/existing informa- 

for HTML tags. For each tag encountered, the actions listed tion source. If a URL does not link to a valid information 

in the table in Appendix D are performed accordingly. For source, the URL is "unlinked", but its text is left in place. If 
example, if a </CODE> tag is encountered, the tag is moved 35 the URL does link to a valid information source, a check is 

to the start of the seginent and the scan is continued. As performed to determine if one or more of the groups in 

another example, if a <DD> tag is encountered, the tag is processed search data 28 are present in the URL or in the 

moved to the start of the segment and the scan is stopped. As information source to which it points. If one or more groups 

another example, if a <TITLE> tag is encountered, the tag are present, this information source is retrieved by Informa- 

is not moved to the start of the segment and the scan stops. tion Retrieval means 50 and processed by Selector means 

In the absence of a tag which stops the scan, the scan 58. The final segment determined for this retrieved infor- 

terminates when the beginning of the retrieved information mation source is ranked against the final segments previ- 

source is encountered. ously determined for the other retrieved information sources 

If, at step 908, a <BODY> tag is present, the segment start 45 and is added to the sorted final segments obtained at step 

is updated to exclude the tag and all material before it. 800. 'ITie clean up operations of step 804 are then performed 

Next, a determination is made at step 912 as to whether on this latest, final segment, 

the segment includes a </BODY> tag. If no such tag is The retrieval of information sources which are linked to 

present, then at step 916, the retrieved information source is previously retrieved information sources is limited to a 

scanned, commencing at the end defined by the segment and preselected number of levels of recursion. It is contemplated 

working towards the end of the retrieved information source, that this number of levels of recursion will be a selectable 

for HTML tags. For each tag encountered, the actions listed parameter, although a suitable number of levels of recursion 

in the table in Appendix E are performed accordingly. For can be specified as a fixed default, if desired. In a present 
example, if a </CAPTION> tag is encountered, the tag is 55 embodiment of the invention, no recursion (zero levels) is 

moved to the end of the segment and the scan is continued. the selected default., but it is contemplated that more levels 

As another example, if a </LI> tag is encountered, the tag is may be desired in other circumstances, 

moved to the end of the segment and the scan is stopped. As if, at step 804, the information source contains only text, 

another example, if an <ADDRESS> tag is encountered, the i.e.— is not an HTML document, then the clean up proceeds 
tag is not moved to the end of the segment and the scan ^° as shown in FIG. 8. Specifically, at step 950, the information 

stops. In the absence of a tag which stops the scan, the scan source is scanned, from the start defined by the segment to 

terminates when the end of the retrieved information source the start of the information source, until the first blank hne 

is encountered. is encountered or the start of the information source is 

If, at step 912, it is determined that the segment does 55 reached. If, as determined at step 954, a blank line was 

include a </BODY> tag, the segment end is updated to encountered, the segment start is updated at step 958 to 

exclude the tag and all of the material following it. include all material up to the blank line. If, as determined at 
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step 954, the start of the information source is encountered, will be considered. Appendix G shows the actual HTML 

the segment start is updated at step 962 to include all pages returned by each search engine and Appendix H 

material up to the beginning of the source. shows the list of URL's which have been extracted from the 
Next, at step 966, the information source is scanned, from ^ pages in Appendix G, after obvious redundancies have been 

the end defined by the segment to the end of the information eUminated. In the Appendix, the URL's located by the 

source, until the first blank line is encountered or the end of AltaVista engine are identified with a «av#r' prefix, those 

the information source is reached. If, as determined at step k„ fu^ t, ^ ^ ^ a a -.u «i uu,, 

ui 1 V _. u , . located by the Lycos engine are identified with a "ly##' 

970, a blank line was encountered, the segment end is - ^ / ^ ^ , - . . , , 

updated at step 974 to include all material down to the blank lO ^^'f " ^""^'^ ^"^me are identified 

line. If, as determined at step 966, the end of the information "^'^^ ' "^'^^^^ P'"^^* ^ ^^^'^ 

source is encountered, the segment end is updated at step redundant *hit' in the first twenty URL's located by 

978 to include all material down to the end of the source. AltaVista, resulting in only nineteen entries for AltaVista in 

As will be apparent to those of skill in the art, if infor- of Appendix G. Similarly, there were two redundant 

mation sources in formats other than text or HTML are ^^st twenty URL's located by Lycos, resulting 

retrieved, appropriate clean up operations will be performed, in only eighteen entires for Lycos in the list of Appendix G. 

as desired. In the cases wherein a redundancy is determined between 

As a final step of Selector means 58, the highest ranked, the hits returned by two or more search engines, the highest 
"cleaned up" segment is selected for output to the user, as is 20 ranked hit is retained and the other hit or hits are removed 

each cleaned up segment whose score is no less than a from the search engine results wherein they were lower 

preselected level. In a present embodiment of the invention, scored. For example, if the Lycos search engine ranked a hit 

up to the ten highest scoring segments whose scores are ,s being number two and Excite ranked the same hit as being 

greater than 0.01 are output to the user as a first set and a ^^^^^^ ^^^^ ^^^^^^^ ^^^^^ ^^^^ 
second set of up to the next ten highest scoring segments , , 

whose scores are greater than 0.01 are also available for "^^^er seven, the Lycos hit is retained and the other two hits 
output to the user. As will be apparent to those of skill in the ^'"""^ 

art, the selection of this output criteria is arbitrary and may Information retrieval means 50 then retrieves each of the 
be varied as desired but this criteria has been found to 30 information sources Usted in Appendix G, if possible, and 

provide reasonable results. ^^^^ ^^^^-^^^^ information sources are processed by Selec- 

Output device 62 then outputs the portions 66 of the ^^ans 58 to obtain the list of cleaned up final segments 

cleaned up information sources indicated by the selected ^^own in Appendix I. This list includes the URL to retrieve 
segments to the user. In a present embodiment of the „ . r *u * j j • ex. 

, . * * • 1 J .J ... .J 35 the information source, the start and end points of the 

invention, the output portions include a header which iden- , ^ ^ . , , 

tifies the ranking of the portion, a link (URL) to the original "^^'"'^ ^^"^'"^ (expressed as byte offsets from the 

information source (if appropriate), a number indicating the ^egmnmg of the mformation source), and the score assigned 

size of the original information source and a link (if ^° information source by Selector means 58. 
appropriate) to the Search Engine 36 with which the infor- 40 Appendix J shows the formatted text (converted from the 
mation source was found. HTMLcode) of twoof the information sources retrieved 

An example of the operation of an embodiment of the t^e information source listed in Appendix G and 

present invention is given below. In the example, the user * .^a-^ v u *u « i . r .t, • r 

u . J T w . , r,' , . . Appendix K shows the final segments from these informa- 

has entered "Where do Monarch butterflies spend the win- ..... 

o» ♦u NT 4 1 T c u * io lion sourccs, as output to the user by output means 62. 

ter? as the Natural Language Search Data 28, The pro- » j f 

cessed search data from the Natural Language Query Pro- As discussed above, the present invention allows a user to 

cesser 24 is "rankl(Monarch) phrase(butterflies spend) input a natural language query, search-multiple-and ^diverse 

phrase (winter)" and this is passed to meta search engine 32. datahase^;reffie^^ 

In this example, a set of search engines 36 has been are^deeDgd^lcv^t'to the liser's quer y and t o^extractJl^e 

previously selected and includes the Lycos, AltaVista and ^rejevanrp^^nsoT^^ 

Excite engines. Meta search engine 32 simplifies the pro- and'-prieserit^fiemlorth'eliser. It is contemplated that the 

cessed search data 28 for each search engine in the set to pr^^^i^^^^rnion^^Pa^^ the user by culling many 
obtain simplified search data appropriate to each engine. 55 information sources which are not relevant to the query and 

Specifically, for the Lycos engine, the search data which is ^y extracting the relevant portions of the relevant informa- 

dLspatched is, "Monarch+butterilies+spend+winter". For the ^-^^ sources. Thus, the user wiU be presented with a concise 

^^T)u r^'fl-^ 'T'' '^f '/k!;.'*^ ' r^r'^'^t ^^^^'^^ information which is relevant to the original 
AND+(butterflies+OR+spend)+AND+(winter) Monarch • ^ 

ranked first". Finally, for the Excite search engine, the search ^"^0'- 

data is, "Monarch+AND+(butterflics+OR+spend)+AND+ The above-described embodiments of the invention are 

(winter)". ITiis search data is appropriately combined with intended to be examples of the present invention and alter- 

the URL for each respective search engine and is transmitted ations and modifications may be effected thereto, by those of 

to the search engine. skill in the art, without departing from the scope of the 

Again, in this example it has been previously decided that invention which is defined solely by the claims appended 

no more that the first twenty 'hits' from each search engine hereto. 
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TABLE 1 



Appendix A 



Tm after 


I need 


all about 


all of 


all on 


any of 


available about 


available at 


available in 


available on 


available to 


both of 


example from 


.example of 


examples from 


examples of 


find out 


Find Out 


find out about 


Find out about 


go about 


happened at 


happened in 


happened on 


happened to 


happens at 


happens in 


happens on 


happens to 


has in 


have in 


how much 


How much 


included in 


information about 


information from 


information on 


know about 


known about 


list of 


lists of 


mention of 


more about 


more of 


more on 


name of 


names of 


need to know 


overview of 


some of 


summary about 


summary from 


summary of 


summary on 


taken by 


tell me about 


Tell me about 


the heck 


things about 


we're after 


We're after 









10 



15 



How 
Why 
who 



TABLE 5-cx»ntinued 



seven 


short 


shorter 


shortest 


six 


slow 


slowest 


spring 


standard 


summer 


ten 


three 


two 


unique 


unofficial 


various 


violet 


white 


widely 


winter 


worse 


worst 


yellow 







TABLE 6 



Appendix A 



What 
how 
why 



When 
what 



Where 
when 



Who 
where 



TABLE 2 



Appendix A 



r 


I'd 


I'll 


I'm 


a 


all 


am 


an 


any 


are 


as 


ask 


available 


be 


been 


being 


both 


but 


call 


called 


can 


can't 


did 


do 


docs 


doesn't 


don't 


example 


examples 


explain 


find 


following 


get 


gets 


getting 


give 


got 


gotten 


had 


happened 


happens 


has 


have 


he 


her 


hers 


him 


his 


how 


how's 


if 


include 


included 


includes 


including 


info 


information 


irregardless 


is 


it 


it's 


its 


know 


like 


list' 


lists 


look 


many 


may 


me 


mention 


might 


more 


my 


name 


named 


names 


no 


nor 


not 


one 


our 


ours 


over- 


really 


regardless 


she 


should 


show 


view 


such 


summary 


take 


taken 


tell 


some 


that 


the 


their 


them 


there 


tells 


they 


thing 


things 


this 


took 


there's 


was 


we 


we'd 


we'll 


were 


want 


what's 


when 


when's 


where 


Where's 


what 


who's 


whose 


why 


why's 


will 


who 


would 


you 


your 






won't 



TABLE 3 
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Classification 


Whole Group 


Individual Words in Group 


ranklO 


10 


10/Cn + 1) 


quoteO 


7 


2/(2*n + 1) 


capitalQ 


7 


7/(2*n + 1) 


phrascQ 


5 


5/(n + 1) 


numberQ 


3 


n/a 


qwordQ 


3 


n/a 


adject 


3 


3/(n + 1) 



where n is the number of individual words in a group. 
30 Notes: 

(1) If a group is classified as ranklQ, capitalQ or quote 0; do not count any 
words in groups classified as prepO, j'oinQ or throwO which are within the 

a group classified as orQ, each group within the orQ group is scored 
separately in accordance with the regime above and the individual scores are 
added and maintained for the candidate as a whole. 



Appendix C 



Classification Initial Value 



ranklO 


0.3 


capitalQ 


0.4 


quote 0 


0.4 


phraseQ 


0.5 


numberQ 


0.7 


adjectiveQ 


0.7 


qwordQ 


0.9 









between 


by 




Tag 


Action 


Continue/Stop Scan 


about 


around 


at 






for 


from 


in 


into 


of 




<!— > 


Include this tag. 


Continue. 


on 


onto 


over 


to 


unto 




</A> 


Include this tag. 


Continue. 


with 










55 


</ADDRESS> 


Don't include this tag. 


Stop. 












</AP?hEV> 


Don't include this tag. 


Stop. 














</B> 


Include this tag. 


Continue. 














</BANNER> 


Don't include this tag. 


Stop. 






TABLE 5 








</BIO> 


Include this tag. 


Continue. 














<yBUNK> 


Include this tag. 


Continue. 


antique 


architectural bad 


best 


better 




^LOCKQUOTE> 


Don't include this tag. 


Stop. 


big 


bigger 


biggest 


black 


blue 


60 


</BODY> 


Don't include this tag. 


Stop. 


brown 


different 


eight 


electronic 


fall 




</BQ> 


Don't include this tag. 


Stop. 


fast 


fastest 


five 


four 


good 




^CAPTION> 


Include this tag. 


Continue. 


green 


grey 


high 


highest 


high-tech 




</CENTER> 


Don't include this tag. 


Stop. 


identical 


large 


larger 


largest 


least 




</crrE> 


Include this tag. 


Continue. 


little 


long 


longest 


longer 


low 


65 


</CODE> 


Include this tag. 


Continue. 


lowest 


most 


natural 


nine 


official 


</coMMEisrr> 


Include this tag. 


Continue. 


one 


orange 


poor 


purple 


red 




</CREDrr> 


Include this tag. 


Continue. 
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Appendix D 





Action 


Continue/Stop Scan 


</DD> 


LJon t include this t&g. 


Stop. 


</DFN> 


Include this tsg. 


Continue, 


</DIR> 


Don't include this tag. 


Stop. 


</DIV> 


Don't include this tag. 


Stop. 


</DL> 


Don't include this tag. 


Stop. 


</DT> 


Don't include this tag. 


Stop. 


</EM> 


Include this tag. 


Continue. 




Don't include this tag. 


Stop. 




Include this tag. 


Continue. 




Include this tag. 


Continue. 




Don't include this tag. 


Stop. 




Don t include this tag. 


Slop. 


</H3> 


Include this tag. 


Continue. 


</rti> 


Include this tag. 


Continue. 


</H3> 


Include this tag. 


Continue. 


</H4> 


Include this tag. 


Continue. 


</rlj> 


Include this tag. 


Continue. 


<jno> 


Include this tag. 


Continue. 


<jnxij\u> 


Don't include this tag. 


Stop. 


<jni yiLi> 


Don't include this tag. 


Stop. 


</i> 


Include this tag. 


Continue. 


</IMG> 


Include this tag. 


Continue. 




Include this tag. 


Continue. 


</JVDU> 


Include this tag. 


Continue. 


</LH> 


Include this tag. 


Continue. 


</LI> 


Don't include this tag. 


Slop. 


</L1SI1NCj> 


Don't include this tag. 


Slop. 


</MAP> 


Don't include this tag. 


Slop. 


</MARQUEE> 


Don't include this lag. 


Slop. 


</McNU> 


Don't include this tag. 


Stop. 


</MULl ICUL> 


Don't include this tag. 


Stop. 


</NOBR> 


Don't include this tag. 


Slop. 


</NOFRAMES> 


Include this tag. 


Continue 




Include this tag. 


Continue. 


</[MUl b> 


Include this tag. 


Continue. 


</OBJhCI> 


Include this tag. 


Continue. 


<yoL> 


Don't include this tag. 


Stop. 


</Ur 1 HJI\> 


Include this tag. 


Continue, 


</P> 


Don't include this tag. 


Stop. 




Don't include this tag. 


Stop. 


<JPRE> 


Don't include this tag. 


Stop. 


</S> 


Include this tag. 


Continue. 


^SAMP> 


Include this tag. 


Continue. 


</bCRJr 1 > 


Don't include this lag. 


Slop. 


-/CUT C»''*T'^ 


Include this tag. 


Continue. 


<;/SMALL> 


Include this tag. 


Continue. 


</SPAN> 


Include this tag. 


Continue. 


</STRIKE> 


Include this tag. 


Continue, 


</STRONG> 


Include this tag. 


Continue. 


</STyLE> 


Don't include this tag. 


Slop. 


</SUB> 


Include this tag. 


Continue. 


</SUP> 


Include this tag. 


Continue. 


</TABLE> 


Don't include this lag. 


Stop, 


</rD> 


Include this tag. 


Continue. 


</TEXTAREA> 


Include this tag. 


Continue. 


</TFOOT> 


Don't include this lag. 


Stop. 


</rH> 


Include this tag. 


Continue. 


</l HbAL)> 


Don't include this tag. 


Stop. 


</rT> 


Include this tag. 


Continue. 


^> 


Include this tag. 


Continue. 


^L> 


Don't include this tag. 


Stop. 


</VAR> 


Include this lag. 


Continue. 


<AVBR> 


Include this tag. 


Continue. 


</XMP> 


Don't include this tag. 


Stop. 


<A> 


Include this tag. 


Continue. 


<ADDRESS> 


Include this tag. 


Stop. 


<APPLET> 


Include this tag. 


Stop. 


<AREA> 


Include this tag. 


Continue. 


<B> 


Include this tag. 


Continue. 


<BANNER> 


Include this tag. 


Slop. 


<BASE> 


Include this tag. 


Continue. 


<BASEFONT> 


Don't include this tag. 


Stop. 


<BGSOUND> 


Don't include this lag. 


Stop. 


<BIG> 


Include this tag. 


Continue. 


<BUNK> 


Include this tag. 


Continue, 


<BmCKQUOTE> 


Include this tag. 


Slop. 
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5 



15 



20 



55 



60 



Ug 


Action 


Continue/Stop Scan 


<BODy> 


Don't include this tag. Slop. 


<BQ> 


Include this tag. 


Slop. 


<BR> 


Include this tag. 


Continue, 


<CAPTlON> 


Include this tag. 


Continue. 


<CENTER> 


Include this tag. 


Stop. 


<CITE> 


include this tag. 


Continue. 


<CODE> 


Include this tag. 


Continue. 


<COL> 


Include this tag. 


Continue. 


</ITrLE> 


Don't include this tag. Stop. 


</rR> 


Don't include this tag. Stop. 


<COLGROUP> 


Include this tag. 


Continue. 


<tA./iviMr;rM i > 


Include this tag. 


Continue, 


<CK.bDri > 


Include this tag. 


Continue. 


<DD> 


Include this tag. 


Stop. 


<DFN> 


Include this tag. 


Continue. 


<DIR> 


Include this tag. 


Stop. 


<DIV> 


Include this tag. 


Stop. 


<DL> 


Include this tag. 


Stop. 


<DT> 


Include this tag. 


Stop, 


<EM> 


Include this tag. 


Continue, 


<EMBED> 


Include this tag. 


Continue. 


<FIG> 


Include this tag. 


Stop. 


<FN> 


Include this tag. 


Continue. 


<FONT> 


Include this tag. 


Continue. 


<FORM> 


Include this tag. 


Stop. 


<FRAME> 


Include this tag. 


Continue. 


<FRAMESET> 


Include this tag. 


Stop. 


<H1> 


Include this lag. 


Stop. 


<H2> 


Include this tag. 


Slop. 


<H3> 


Include this tag. 


Stop. 


<H4> 


Include this tag. 


Stop. 


<H5> 


Include this tag. 


Stop. 


<H6> 


Include this tag. 


Stop, 


<HEAD> 


Don't include this tag. Stop 


<HR> 


Don't include this tag. Stop. 


<HTML> 


Don't include this lag. Stop. 


<I> 


Include this tag. 


Continue. 


<IFRAME> 


Include this tag. 


Continue. 


<IMG> 


Include this tag. 


Continue. 


<iNPur> 


Include this tag. 


Continue. 


<KBD> 


Include this tag. 


Continue. 


<LH> 


Include this tag. 


Continue. 


<LI> 


Include this tag. 


Stop. 


<LINK> 


Include this tag. 


Continue. 


<UST[NG> 


Include this tag. 


Stop. 


<MAP> 


Include this lag. 


Stop, 


<MARQUEE> 


Include ihis tag. 


Stop, 


<MENU> 


Include this tag. 


Stop, 


<MULTICOL> 


Include this tag. 


Stop, 


<NEXTID> 


Include this tag. 


Continue. 


<NOBR> 


Include this tag. 


Stop. 


<NOFRAMES> 


Include this tag. 


Continue. 


<NOSCRIPT> 


Include this tag. 


Continue. 


<NOTE> 


Include this tag. 


Continue. 


<OBJECT> 


Include this tag. 


Continue. 


<OL> 


Include this tag. 


Stop. 


<onioN> 


Include this tag. 


Continue. 


<OVCRLAY> 


Include this tag. 


Continue. 


<P> 


Include this tag. 


Stop. 


<PARAM> 


Include this tag. 


Continue. 


<PLAINTEXT> 


Include this tag. 


Slop. 


<PRE> 


Include this tag. 


Slop. 


<RANGE> 


Include this tag. 


Continue. 


<S> 


Include this tag. 


Continue. 


<SAMP> 


Include this tag. 


Continue. 


<SCRIPT> 


Include this tag. 


Stop. 


<SELECT> 


Include this tag. 


Continue, 


<SMALL> 


Include this tag. 


Continue. 


<SPACER> 


Include this tag. 


Continue. 


<SPAN> 


Include this tag. 


Continue. 


<SPOT> 


Include this tag. 


Continue, 


<srrRiKE> 


Include this tag. 


Continue. 


<STRONG> 


Include this tag. 


Continue. 


<STYLE> 


Include this tag. 


Slop. 


<SUB> 


Include this tag. 


Continue. 


<SUP> 


Include this tag. 


Continue. 
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Tag 


Action 


Continue/Stop Scan 


5 


T^g 


Action 


Continue/Stop Scan 


<TAB> 


IQCIUQC tillS ldg> 


v^ontinue. 






Include this tag. 


Continue. 




Include this tBg. 


Stop, 




</UoJbCl> 


Include this tag. 


Continue. 




lUkfiuuc LUIS uig< 


Stop, 




<JOh> 


Don't include this tag. 


Stop. 




Include this t&g. 


Continue. 






Include this tag. 


Continue. 




Include this tsg. 


Continue. 




</?> 


Don't include this tag. 


Stop. 


<TFOOT> 


Include this tflg. 


Stop. 


10 




Don't include this tag. 


Stop. 


<TH> 


Include this t3g. 


Continue. 




</PRE> 


Don't include this tag. 


Stop. 




Include this tsg. 


Stop. 




</S> 


Include this tag. 


Continue. 


<TTTLE> 


Don't include this tag. 


Stop. 




^SAMP> 


Include this tag. 


Continue. 


<TR> 


Include this tag. 


Stop. 




</ot,ru,r 1 > 


Don't include this tag. 


Stop. 


<TX> 


Include this tag. 


Continue. 






Include this tag. 


Continue. 


<LI> 


Include this tag. 


Continue. 


15 


.f/QVi AT 1 >. 


Include this tag. 


Continue. 


<UL> 


Include this tag. 


Stop, 




<JSPAS> 


Include this tag. 


Continue. 


<VAR> 


Include this tag. 


Continue, 




</srrRTKF> 


Include this tag. 


Continue. 


<WBR> 


Include this tag. 


Continue, 






Include this tag. 


Continue. 


<XMP> 




Stop 






Don't include this tag. 


Stop. 


Anv ntli^r ttiO<i* 
/vuy viiiwi uiga« 


Include this tag. 


Continue. 




</0\jD> 


Include this tag. 


Continue. 










</SUP> 


Include this tag. 


Continue. 
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</ I/VOIjO 


Don't include this tag. 


Stop. 










<yTD> 


Include this tag. 


Continue. 










^-/np VTA n p A ^ 


Include this tag. 


Continue, 




APPENDIX E 






</ L r\J\Jl > 


Don't include this tag. 


Stop. 










</rH> 


Include this tag. 


Continue, 


Tag 


Action 


Continue/Stop Scan 
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</l 11Cj'VIJ> 


Don't include this tag. 


Stop. 




</ 1 1 1 i..U> 


Don't include this tag. 


aiop. 


<!— > 


Include this tag. 


Continue. 




</rR> 


Don't include this tag. 


dlOp. 


</A> 


Include this tag. 


Continue. 




</Tr> 


Include this tag. 


Continue. 


</ADDKESS> 


Don't include this tag. 


Stop, 




</u> 


Include this tag. 


Continue, 


</APPLET> 


Don't include this tag. 


Stop, 




</UL> 


Don't include this tag. 


Stop. 




Include this tag. 


Continue. 




</VAR> 


Include this tag. 


Continue. 


</BANNER> 


Don't include this tag. 


Stop, 
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</WBR> 


Include this tag. 


Continue, 


</BIG> 


Include this tag. 


Continue, 




</XMP> 


Don't include this tag. 


Stop. 




Include this tag. 


Continue. 






Include this tag. 


Continue. 


</llJjUL-K.VJU(ji C> 


Don't include this tag. 


Stop, 




^A nnppcc-s. 

</\UiJKCoo> 


Include this tag. 


Stop. 


</BODY> 


Don't include this tag. 


Stop. 




<Ar r m I > 


Include this tag. 


Stop. 


</BQ> 


Don't include this tag. 


Stop. 




<AREA> 


Include this tag. 


Continue, 


<yCAPTION> 


Include this lag. 


Continue, 
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<B> 


Include this tag. 


Continue. 


<yCENTER> 


Don't include this tag. 


Stop, 


<BA[\INtiK> 


Include this tag. 


Stop. 


</CITE> 


Include this tag. 


Continue, 




<BASE> 


Include this tag. 


Continue. 


<;/CODE> 


Include this tag, 


Continue, 




A CKCrMSJT... 


Don't include this tag. 


Stop. 


<yCOMMENT> 


Include this tag. 


Continue, 




<UOaUUINiJ> 


Don't include this tag. 


Stop. 


<;/CRED[T> 


Include this tag. 


Continue, 




<BIG> 


Include this tag. 


Continue. 


</DD> 


Don't include this tag. 


Stop, 






Include this tag. 


Continue. 


<;/DFN> 


Include this tag. 


Continue. 
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<i>HJC KyU U 1 1 > 


Include this lag. 


Stop. 


</DIR> 


Don't include this tag. 


Stop, 




<BODy> 


Don't include this tag. 


Stop. 


<;/DIV> 


Don't invlude this tag. 


Stop, 




<riKj> 


Include this tag. 


Stop. 


</DL> 


Don't include this tag. 


Stop. 




<BR> 


Include this lag. 


Continue. 


<;/DT> 


Don't include this tag. 


Stop. 






Include this tag. 


Continue. 


</EM> 


Include this tag. 


Continue. 






Include this tag. 


Stop. 


</FIG> 


Don't include this tag. 


Stop 
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<CITE> 


Include this tag. 


Continue. 


<;/FN> 


Include this tag. 


Continue. 




<CODE> 


Include this tag. 


Continue, 


</FONT> 


Include this tag. 


Continue, 




<COL> 


Include this tag. 


Continue. 


<;/FORM> 


Don't include this tag. 


Stop. 




<CULGKUUr> 


Include this tag. 


Continue. 


</FRAMESCT'> 


Don't include this tag. 


Stop. 




<lJUMMt[N 1 > 


Include this tag. 


Continue, 


</Hl> 


Include this tag. 


Continue, 




<CRiiDri > 


Include this tag. 


Continue, 


</H2> 


Include this tag. 


Continue, 
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<DD> 


Include this tag. 


Stop. 


</H3> 


Include this tag. 


Continue. 




<DFN> 


Include this tag. 


Continue. 


</H4> 


Include this tag. 


Continue. 




<DIR> 


Include this tag. 


Stop. 


</liS> 


Include this tag. 


Continue. 




<DIV> 


Include this tag. 


Stop. 


</H6> 


Include this tag. 


Continue. 




<DL> 


Include this tag. 


Stop. 


<;/HEAD> 


Don't include this tag. 


Stop. 




<Ul> 


Include this tag. 


Stop. 


</HTML> 


Don't include this tag. 


Stop. 
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<I1M> 


Include this lag. 


Continue. 


<;/!> 


Include this tag. 


Continue. 


<LMBbL/> 


Include this tag. 


Continue, 


<;/IMG> 


Include this tag. 


Continue. 




<ri\j> 


Include this tag. 


Stop. 


^ /Tim A % 417^ 

</IFRAME> 


Include this tag. 


Continue. 




<FN> 


Include this tag. 


Continue. 


<yKBD> 


Include this tag. 


Continue. 




<FONrr> 


Include this tag. 


Continue. 


</LH> 


Include this tag. 


Continue. 




<FORM> 


Include this tag. 


Stop. 


</LI> 


Don't include this tag. 


Stop. 




<FRAME> 


Include this tag. 


Continue. 


</LISTING> 


Don't include this tag. 


Stop. 


60 


<FRAMESET> 


Include this tag. 


Stop. 


</MAP> 


Don't include this tag. 


Stop. 




<H1> 


Include this tag. 


Stop. 


</MARQUEE> 


Don't include this tag. 


Stop. 




<H2> 


Include this tag. 


Stop. 


</MENU> 


Don't include this tag. 


Stop. 




<IU> 


Include this lag. 


Stop. 


</MULTICOL> 


Don't include this tag. 


Stop. 




<H4> 


Include this tag. 


Stop. 


</NOBR> 


Don't include this tag. 


Stop. 


65 


<H5> 


Include this tag. 


Stop. 


</NOFRAMES> 


Include this tag. 


Continue. 


<H6> 


Include this lag. 


Stop. 


</NOSCRIPT> 


Include this tag, 


Continue. 




<HEAD> 


Don't include this lag. 


Stop 
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Action 


Continue/Stop Scan 


<HR> 


Don't include this tag. 


Stop. 


<HTML> 


Don't include this tag. 


Stop. 


<I> 


Include this tag. 


Continue. 


<IFRAME> 


Include this tag. 


Continue. 


<IMG> 


Include this tag. 


Continue. 


<INPUT> 


Include this tag. 


Continue. 


<KBD> 


Include this tag. 


Continue. 


<LH> 


Include this tag. 


Continue. 


<LI> 


Include this tag. 


Stop. 


<UNK> 


Include this tag. 


Continue. 


<USTING> 


Include this tag. 


Stop. 


<MAP> 


Include this tag. 


Stop. 


<MARQUEE> 


Include this tag. 


Stop. 


<MENU> 


Include this tag. 


Stop. 


<MULTICOL> 


Include this tag. 


Stop. 


<NEXT[D> 


Include this tag. 


Continue. 


<NOBR> 


Include this tag. 


Stop. 


<NOFRAMES> 


Include this tag. 


Continue. 


<NOSCRIPT> 


Include this tag. 


Continue. 


<NOTE> 


Include this tag. 


Continue. 


<OBJECr> 


Include this tag. 


Continue. 


<0L> 


Include this tag. 


Stop. 


<OPTION> 


Include this tag. 


Continue. 


<OVERLAY> 


Include this tag. 


Continue. 


<P> 


Include this tag. 


Stop. 


<PARAM> 


Include this tag. 


Continue. 


<PLAINTEXT> 


Include this tag. 


Stop. 


<PRE> 


Include this tag. 


Stop. 


<RANGE> 


Include this tag. 


Continue. 


<S> 


Include this tag. 


Continue. 


<SAMP> 


Include this tag. 


Continue. 


<SCRIPT> 


Include this tag. 


Stop. 


<SELECT> 


Include this tag. 


Continue. 


<SMALL> 


Include this tag. 


Continue. 


<SPACER> 


Include this tag. 


Continue. 


<SPAN> 


Include this tag. 


Continue. 


<SPOT> 


Include this tag. 


Continue. 


<STRIKE> 


Include this tag. 


Continue. 


<STRONCt> 


Include this tag. 


Continue. 


<STYLE> 


Include this tag. 


Stop. 


<SUB> 


Include this tag. 


Continue. 


<SUP> 


Include this tag. 


Continue. 


<TAB> 


Include this tag. 


Continue. 


<TABLE> 


Include this tag. 


Stop. 


<TBODY> 


Include this tag. 


Stop. 


<TD> 


Include this tag. 


Continue. 


<TEXTAREA> 


Include this tag. 


Continue. 


<TFOOT> 


Include this tag. 


Stop. 


<TH> 


Include this tag. 


Continue. 


<THEAD> 


Include this tag. 


Stop. 


<TrrLE> 


Don't include this tag. 


Stop. 


<TR> 


Include this tag. 


Stop. 


<TT> 


Include this tag. 


Continue. 


<U> 


Include this tag. 


Continue. 


<UL> 


Include this tag. 


Stop. 


<VAR> 


Include this tag. 


Continue. 


<WBR> 


Include this tag. 


Continue. 


<XMP> 


Include this tag. 


Stop. 


Any other tags; 


Include this tag. 


Continue. 
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APPENDIX 

(1) Replace any filenames with the segments (e.g., within <A> or 
<IMG>) that arc specified with relative names (i.e., not full URLs) with 
the appropriate full URLs. 

(2) Remove the following tags and their contente; 
<!...> 

<BASEFONT . . . > 
<COMMENT . . . > 
<META . . . > 

<FRAMESEr . . . > </FRAMESET> 

<TrrLE> </nTLE> 

<HEAD> </HEAD> 
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<LINK . . . > 
<NEXTID . . . > 
5 (3) Remove the following tags, but keep their contents: 
<DIV . . . > </Diy> 

<HTML> </HTML> 
<SPAN . . . > </SPAN> 
(4) Alter the following tags in 
<APPLET . . . > </APPLCT> 

10 



<BGSOUND . . . > - 

<EMBED ...>-- 

<IMG .,.>-- 

<MARQUEE . , . > 
</MARQUEE> -- 

<OBJECT . . . > </OBJECT> 
<SCRIPT . . . > <VSCRIPT> - 
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the given manner: 

-- Replace the entire tag with an APPLET 
icon which is linked to the entire 
source document. 

Replace the entire tag with a SOUND 
icon which is linked to the particular 
sound. 

Replace the entire lag with an EMBED 
icon which is linked to the entire 
source document. 
Replace the entire tag with an 
IMAGE icon which is linked to the 
particular image. 
Replace the entire tag with a 
MARQUEE icon which is linked to the 
entire document, 
-- Replace the entire tag with an OBJECT 
icon which is linked to the entire 
document. 

- Replace the entire tag with a SCRIPT 
icon which is linked to the entire 
document. 
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AltaVlst Search results: 



Word count: Monarch: 19925 

Documents 1-10 of about 300 matching the query, best matches first. 
Darmus plexippus [Monarch Butterfly] 

Damns plexippus. Monarch Butterfly. * Classification. Phylum: 
Arthropoda. Class: Insecta. Order: Lepidoptcra. Family: Nymphalidac. 
Uble of Contents .... 

http://www.oiLitd.umich.edu/bio/doc.cgi/Arthropoda/Insecta/Lepidoptera/ 
Nymphalidae/Danaus_plexippus.ftl - size 9K - 1 Jul 96 
Monarch Butterfly 

STATE INSECT - MONARCH BUTTERFLY Damus plexippus. The 
monarch was chosen in 1975 to be Illinois' state insect. The third 
grade classes at Dennis School . . . 

http://dnr.statc.il.us/NREDU/CLASSRM/symbol/inscct.htm - size 2K - 29 
Apr 96 

Monarch Watch: Tracking the journey 

Monday, September 30, 1996. Monarch Watch: Tracking the journey. 
Internet resources. At lcft:Yuna Asriyan, left, Joe Nguyen and other 
fourth graders watch. 

http://www.portland.net/ph/monews/story2.htm - size 1 OK - 30 Sep 96 
Control of Monarch Parasite 

How Do I Control the Monarch Parasite? Dear Fellow Monarch Raiser: 
Many of us are distributing monarch butterflies as a means to excite the 
public, . . . 

http://monarch.bio.ukans.edu/parasitecontrol - size 9K - 5 Apr 96 
The Blake School: Monarch Butterfly Project, Research9 
Habitat Status of Monarch Butterflies in Mexico and the U.S. Morgan 
Steiner. Deforestation in Mexico is a major problem for the habitat of 
the monarch , . . 

http://www.blake.pvt.kl 2. mn.us/campus/projects/upper/monarchs/ 

rcsearch/stcinerl.html - size 2K - 2 Mar 96 

Journey North: Monarch Butterfly Updates 

Journey North News. Monarch Butterfly Migration Update: 

AprU 16, 1996. 

Only five new monarch sightings were reported this week! Again, 
biologists Dr. ... . 

http://www.ties.kl2.nin.us/--jnorth/critters/monarch/829680914.html - size 
9K - 16 Apr 96 

Journey North: Monarch Butterfly Updates 

Journey North News. Monarch Migration Update: April 9, 1996. Our 
Internet Field Team is hard at work! There are many, many monarchs to 
map this week! By . . . 

http://www.ties.kl 2.mn.us/-jnorth/critters/monarch/829061947.html - size 
13K - 9 Apr 96 
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Journey North: Monarch Butterfly Updates 

Journey North News. Monarch Migration Update: March 26, 1996. A 
few new monarch sightings were reported this week. Here's a chart 
summarizing this . . . 

http:/Avww.ties.kl 2. mn.U5/~jnorth/critters/monarch/82785371 4.html size 
lOK - 26 Mar 96 
Monarch Migration 

Journey North's 3rd Annual Spring Monarch Migration Project Get ready! 
You're invited to take part in an international science project with 
students and . . . 

http://bvsd.kl2.co.us/monarch.html - size 4K - 17 Jun 96 
Journey North: Monarch Butterfly Updates 

Journey North News, Deforestation and the Monarch Butterfly Reserves, 
by Liz Olson, Grade 11, The Blake Schools. One of the major issues that 
has arisen . . . 

http://www.ties.kl2.mn.us/-jnorth/critters/monarch/826042770.html - size 
4K - 5 Mar 96 

p. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [Next] 



Search and Display the Results 

Selection Criteria: Please use Advanced Syntax (AND, OR, NOT, NEAR). 
Results Ranking Criteria: Documents containing these words will be 
listed first. 

Start date: End date: e.g. 21/Mar/96 



Surprise . Legal. FAQ . Add URL . Feedback . Text-Only 



Copyright - 1996 Digital Equipment Corporation. All rights reserved. 
Word count: Monarch: 19925 (second page) 

I>ocunients 11-20 of about 300 matching the query, best matches first. 
Monarch Detectives 

Discovery of the Monarch Migration. Monarchs at an overwintering site in 
central Mexico. Research of the monarch migration and biology began 
around 1857 . . . 

bttp;//mo narch.bio.ukans.edu/migrtnhist. html - size 4K - 5 Apr 96 
The Blake School: Monarch Butterfly Project, Field ReportsS 
Field Report: Monarch Mortality Estimates from the December Snowfall. 
By Todd Stiefler. December 30, 1995 a snowstorm hit Central Mexico 
dropping , , . 

http://www.blakc.pvt.kl2.mn.us/campus/projects/upper/monarchs/ 
reports/reports. html - size 4K - 14 Jun 96 
The Monarch Butterfly 

Monarch Butterfly. Did you know the bluejay is afraid of the male 
Monarch? They're also noctunal. The order of all butterflies is 
Lepidoptcra, and they , . . 

http ://tnfo .csd.org:70/WWW/schoo Is/pattonville/insect. museum/ 
butlerfly.h 

tml - size 3K - 4 Mar 96 
Monarch Butterfly Migration 

The Migratory Behavior of the Monarch Butterfly. Karen Hanson Nicki 
Nguyen Hien To. I. Introduction: The awesome sight of hundreds of 
monarch butterflies. 

http://genbiol.cbs.umn.edu/lQ09/1009h/monarchs.html - size 12K - 
28 May 96 

Guide to Pismo Beach - Monarch Butterflies 

Pis mo Beach - Monarch Butterflies. The "Butterfly Trees" of Pismo Beach 
are an added attraction to the city. From late November through 
February, . . . 

http://dial.net/pismo/monarch - size 6K - 21 Jun 96 

Joumey North: Monarch Butterflies Migration 

Monarch Butterflies Migration Updates will be posted on: ^Fliesdays. 

Background Information. Migration Data Table. Journey North News. 

Monarch Butterflies . . . 

http://www.ties.kl2.mn.us/-jnorth/crittcrs/monarch/ - size 5K - 
30 May 96 

Pismo Beach Guide - Monarch Butterflies 

Pismo Beach - Monarch Butterflies. The "Butterfly Trees" of Pismo Beach 
are an added attraction to the city. From late November through 
February, . . . 

http://webmill.com/pismo/monaTch - size 6K - 19 Jun 96 
Follow the Monarch Butterfly Migration 

Follow the Monarch Butterfly Migration. Mexico. Nature Observation. 
Click on an image to see it in full view: Every autumn, . . . 
http://www.dcepriver.com/adven/htm/181.htm - size 4K - 13 Jan 96 
Monarch migration 

Where do Monarchs Go for Winter? A view of an oyamel (fir) forest on 
Sierra Chincua near Angangueo in centra) Mexico. Ibward the end of 
summer (late . . . 

http://monarch.bio.ukans.edu/migration.htnil - size 6K - 5 Apr 96 
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Texas Monarch Watch 

Texas Monarch Watch Nongame Program Texas Parks and Wildlife 
4200 Smith School Road Austin, TX 78744. The Texas Monarch Watch. 
The Texas Monarch Watch is . . . 

http://monflrch.bio.ukans.edu/texosmw.html - size 19K - 5 Apr 96 
[Prcv] p. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1819 20 [Next] 



Search and Display the Results 
30 Selection Criteria: Please use Advanced Syntax (AND, OR, NOT, NEAR). 
Results Ranking Criteria: Documents containing these words will be 
listed first. 

Start date: End date: e.g. 21/Mar/96 ' 
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Copyright - 1996 Digital Equipment Corporation. All rights reserved. 
Excite! Search results: 

Search Menu 
New Search 
Advanced Search 
Add URL 

Excite and AOL Sign Exclusive Agreement 
Check out our advertiser: Click Here 

Excite Search found 926 documents about: Monarch AND (butterflies OR 
spend) AND (winter). 
Check out Reviews! 
Arts 

Business 

Computing 

Education 

Entertainment 

Health 

Hobbies 

Life & Style 

Money 

News 

Personal Pages 

Politics & I^w 

Regional 

Science 

Shopping 

Sports 



Did You Know? 

Search results arc sorted by relevance, indicated by a percentage 
rating. Click * Sort by Site' to see which websites have the most 
documents. 

Go lb 

Excite Home 
Excite Search 
Excite Reviews 
Excite City.Net 
Excite Live 
Excite Reference 
Excite Tours 
Info 
Help 

Feedback 
Advertising 
Credits 
About Excite 



Did you know? 

Click on 'More Uke This' to see more documents that pertain to your 
search. 



Excite Search is sponsored in part by Sun Microsystems and run on 
10-CPU Ultra Enterprise 4O00 servers. 

Documents 1-10 sorted by confidence 

92% Travel - George H Winslow, Jr. [More Like This] 

URL: http://home.forbin.com/~gwinslow/travel.html 

Summary: With the coming of fall I am reminded that the Monarch 

butterflies are starting their annual migration to their winter 

hibernation grounds. When we arrived at the sanctuary we found 

approximately 5-7 million butterflies in just a few square miles of 
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mountain top. 

92% Butterflies Page [More Like This] 

UR L: http :/Avww.o ise .on. ca/-l asul 1 ivan/bu tie rfiies . html 

Summary: If you are a student in grades 4-6 looking for information to 

create a project on butterflies you are looking in the right place! .) 

Save The Butterfly (Dedicated to saving butterflies around the world) A 

New Butterfly Conservatory (A great place to visit if you have a 

graphic browser). 

92% Monarch Butterfly [More Like This] 
URL: http;//www.tnc.org./infield/spccics/monarch/monarch.htm 
Summary: Burnt orange, up to 4 inches across with black markings, 
Monarchs can be found virtually anywhere in the United States. In late 
September, for instance, they begin to congregate at Cache River Joint 
Wetlands Project in Illinois, a Conservancy preserve, and move southward 
for their annual migration. 

91% Follow the Monarch Butterfly Migration [More Like This] 
URL: http :/Avww,deeprive r. com/adve n/htm/18 1 . htm 
Summary: Click on an image to see it in full view: Every autumn, 
millions of monarch butterflies from North American begin an incredible 
migration, journeying south to spend the winter in central Mexico. 
The forests, which for centuries have been the winter haven of these 
butterflies, ore now threatened by loggers. 

91% On Six Leg;s Flies of the Butter Season S . . , [More Like This] 
URL: 

http ://he rmes .ecn. purdue.cdu :8001/gophcr_dir/The%20 Pu rduc%20 
Coopcrative%20Extension%20Gopher%20Infonnation%20SeTver/ 
Current%20News/Archivcs/96/Apr/04-26/OSL:%20nies%20of% 
20the%20Buttcr%20Season - 

Su mm a r y : h ttp://bloods hot. com :SO/babble/pupp y/cand le . html 
91% Illinois State Insect [More Like This] 

URL: http :/Avww.museum.statc.il.us:70/cxhibits/symbols/inscct. html 
Summary: Some monarchs remain in the vicinity of their breeding 
grounds; others fly north to lay eggs. If this was of interest, you might be 
interested in these other Internet resources on monarchs and other 
insects.. 

91% Monarch Watch [More Like This] 
URL: http://monarch.bio.ukans.edu/ 

Summary: Enjoy your visit, and come back often - we will be continually 

updating many areas. Please feel free to contact us if you arc 

interested in receiving more information about the Monarch Watch or are 

interested in participating in the fall tagging. 

91% K12> Send a Monarch to Mexico! [More Like This] 

URL: http: //www.gi.net/NET/PM- 1996/96-09/96-09-27/0045 .html 

Summary: Joined by a fragile butterfly, shared hope will. Journey North 

program will be featured, including beautiftai footage. 

91% Journey North [More Like This] 

URL: http://www.whro-pbs.org/LearningLink/monarchs.html 

Summary: Sometime next March, when the real monarchs* departure from 

Mexico is announced, the paper butterflies will return to North America. 

What materials are needed to make sure your monarch survives its journey 

south, the winter months in Mexico, and its journey north next spring? 

91% Untitled [More Like This] 

URL: http://www.bell-atl.com/wschool/html/announce/oct/oct2096.htm 
Summary: The postage must be sufficient to mail the butterflies back to 
you from the Journey North office in the U.S. (The monarchs will not be 
mailed from Mexico, so either U.S. or Canadian postage is fine. Urqhuart 
tagged the first monarchs 59 years ago—and graciously agreed to tag the 
first paper monarchs for this symbolic migration! 
Check out our advertiser: Click Here 



Excite Search found 926 documents about: 
What:Wherc: [Help] 
[Advanced Search] 

Super- charge your browser with Excite Direct. Click here I 

-1996 Excite Inc. 

Disclaimer 

Search Menu 

New Search 

Advanced Search 

Add URL 

Excite and AOL Sign Exclusive Agreement 

Check out our advertiser: A Chance to Win a Free Kodak DC25 Camera 
Excite Search found 926 documents about: Monarch AND (butterflies OR 
spend) AND (winter)). 
Check out Reviews! 
Arts 

Business 

Computing 

Education 
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Entertainment 
Health 
5 Hobbies 
Life & Style 
Money 
News 

Personal Pages 
Politics & Law 
10 Regional 
Science 
Shopping 
Sports 



Did You Know? 

2g Search results are sorted by relevance, indicated by a percentage 
rating. Click 'Sort by Site' to see which websites have tbe most 
documents. 



Go lb 

Excite Home 
Excite Search 
Excite Reviews 
Excite City.Net 
Excite Live 
Excite Reference 
Excite Tours 
Info 
25 Help 
Feedback 
Advertising 
Credits 
About Excite 

30 Did you know? 

Click on 'More Like This' to see more documents that pertain to your 
search. 



Excite Search is sponsored in part by Sun Microsystems and run on 
35 10-CPU Ultra Enterprise 4000 servers. 

Documents 11-20 sorted by confidence 

91% Monarch Population Plummets in Mexico [More Like This] 
URL: http://www.isit.com/buttcrfly/articles/scndmon.htm 
Summary: Urqhuart tagged the first monarchs 59 years ago-and 
graciously agreed to tag the first paper monarchs for this symbolic 
migration! All migratory monarchs east of the Rocky Mountains spend 
the winter in just 9 major sanctuaries in Mexico! 
91% Monarch Population Plummets in Mexico [More Like This] 
URL: http://mgfe.com/butterfly/articles/sendmon.htm 
Summary: Or feel free to follow one of the designs found on Journey 
North's WWW site at: http://www. Mail your butterffies in a large manila 
45 envelope to: Journey North, 125 North First Street , Minneapolis, 
Minnesota 55401 USA. 

90% Journey North program will be featured . . . [More Like This] 
URL: http://archives.gsn.org/oct96/0042.html 

Summary: http://www.freaknet.co.il :80/wwwboard/messages/l 47.html 
90% Web gives students front-row seats for b .. . [More Like This] 

50 URL: http://www,dispa tch.com/ncws/news features/butter flies 11 12. html 
Summary: Dispatch Schools Reporter November 12, 1996 Students in 
Tine Gehres' science classes at Wedge wood Middle School watched the 
summer slip away to Mexico on the wings of thousands of monarch 
butterflies. They reported their sightings each day to the Journey 
North site on the World Wide Web. That way, Gehres said, students 

^5 everywhere could plot sightings on a map and track the southward. 
90% ZooNcws - 12 July 96 [More Like This] 
URL: http:/Avww.cpb.uokhsc.edu/OKOOKCZoo/in/ 
ZooNew960912a.html 

Summary: The Monarch butterfly, also known as the milkweed butterfly, 
is one of the world's most widely distributed butterflies and is one of 
only a few that migrate north and south like birds do for the winter. 
Monarchs that hatch and develop in the fall live longer and behave 
differently than those hatched earlier in the year. 
90% Riley said the insects respond to the s . . . [More Like This] 
URL : ht tp ://wwrw.agctr. Isu .edu/wwwac;/4nws 1 0 1 7.txt 
Summary: http://cedar.ag.uiuc.edu:8001/CropSci/wccd-lab/Bill/Bill.htm 
90% Pis mo Beach Guide - Monarch Butterflies [More Like This] 
65 URL: http://webmill.com/pismo/monarch 

Summary: The BuUerflies will form dense clusters on the trees, each 
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animal hanging with its wings down over the one below it to form a 
shingle effect, that gives some shelter from the rain and warmth for the 
group. If a Butterfly is dislodged it may fall victim to inserts or 
fteld mice, since it cannot fly at temperatures much lower than 55 
degrees, and at a temperature lower than 40 degrees, it is. 
90% "The Wanderer" [More Like This] 

URL: http://www,advenmre.com/library/encyclopedia/bug/rfimnarc.html 
Summary: In another few weeks, more Monarchs mature from eggs that 
were laid in other places on milkweed; they also start northward. 
Migrating swarms of Monarchs may number in the tens of thousands 
and there have been years when flocks have been estimated to contain 
millions. 

90% Monarch Butterfly Migration [More Like This] 
URL: http://genbiol.cbs.umn.edu/1009/1009h/monarchs.html 
Summary: lYaveling in a southwesterly direction, the monarchs fly east 
of the Great Lakes and south-southwest in areas west of the Great Lakes. 
Presently, three sights along Ontario's great lakes have been designated 
as butterfly reserves. 

90% THE MONARCH BUTTERFLY SANCTUARY, NATURAL . , . 
[More Like This] 

URL: http://www.mexico-travel.com/states/sl6/132zzl.htm 

Summary: Urquhart put tags on the wings of some butterflies, and 

followed their trails to Mexican territory, always motivated by the 

question: Where do they spend the winter? However, the Monarch has an 

important defense mechanism: they are toxic, and when eaten by birds, 

they accelerate their cardiac rhythms causing death. 
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4) Map the Monarchs! [86%, 3 of 4 terms] 
5 5) 'Netting Butterflies [86%, 3 of 4 terms] 
6) 

gopher://gopher.irformns.kl2.mn.us/00/best-kl2/monarchs/ 
%20%20Map%20The%20Mona [85%, 3 of 4 terms] 
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18) OMIX Marketspace Directory [73%, 3 of 4 terms] 

19) Nation World [73%, 3 of 4 terms] 

20) ITSS Digest [73%, 3 of 4 terms] 

Previous Page Next Page 
Jump down the list: 1 a.a2 
Previous 10 Pages Next 10 Pages 
Edit your search: 
Match all words Match any word 

Click on graphic to visit site. 

New Search . TopNews . Sites by Subject . Top 5% Sites . City Guide . 
Pictures & Sounds 

People Find . Point Review . Road Maps . Software . About Lycos . Club 
Lycos . Help 

Add Your Site to Lycos . Lycos Merchandise 
Copyright - 1996 Lycostm, Inc. All Rights Reserved. 
Questions & Comments 



Appendix H 

avOl: http://virww.oit.itd.umich.edu/bio/doc.cgi/ArthTOpoda/Insecta/Lepidoptera/Nymp 

halidaef Danaus_j?lexippus.ft\ 
av02: http://dnr.state.il.us/NREDU/CLASSRM/symbol/insect.htm 
av03: http ://www.portland.net/ph/monews/story2.htm 
av04: http ://monarch.bio.ukans,edu/parasitecontrol 

av05: http://www.blake.pvt.kl2.mn.us/campus/projects/upper/monarchs/research/stein 
erl.html 

av06: http: //wvw.ties.kl2.mn.us/--jnorth/crittcrs/monarch/829680914.html 
av07: http://www.tics.kl2.mn.us/-jnorth/critters/monarch/829061947.html 
av08: http://www.tics.kl2.mn.us/-jnorth/critters/monarch/827853714.html 
av09: http://bvsd.kl2.co.us/monarch.html 

avlO: http://www.ties.kl2.mn.us/~j noith/critters/monarch/826042770.htnil 
avll: http://monarch.bio.ukans.edu/migrlnhist.html 

avl 2: http ://www.b lake. pvt.kl 2. mn. us/camp us/projccts/upper/monarchs/reports/report 
8.html 

avl 3: http://info.csd.org:70/WWW/schools/pattonville/insect.museum/butterfly.html 

avl4: http://genbiol.cbs.umn.edu/1009/1009h/monarchs.html 

avl 5: http://dial.net/pismo/monarch 

avl 6: http://wcbmill.com/pismo/monarch 

avl 7: http://www.deepriver.conVadven/htm/l81 .htm 

avl 8: http://monarch.bio.ukans.edu/migration.html 

avl9: http://monarch.bio.ukans.edu/texasmw.html 

lyOl: http://www.doesgodexist.org/JanFeb96/Monarch.html 

ly02: http://monarch.bio.ukans.edu/snow.html 

ly03: http://www.cd.uiuc.edu/Activity-Structures/Information-Collcctions/Poolcd-Data- 

Analysis/Map-The- Mo narchs.html 
ly04: http://riceinfo.rice.edu/annadillo/Ftbend/butterfly,html 
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ly05: hup ://compstat.wharton .upenn.edu: 8001/-^Uer/newhope/oct.htmI 

I yO 6 : http ://www.stol af.cdu/othe r/snap/nl insects .html 

ly07: http: //www.mid.net/NET/PM-1995/95-02/95-02-16/0017.html 

ly08: http://whyfiles.new5.wisc.edu/006migration/ 

ly09: http://www.steiny.com/sc/cvc/30.html 

lylO: http://www.infopoint.com/sc/cvc/30.html 

lyll : http://prism.prs.kl2.nj.us/WWW/OII/disc-pub/novae-group/0067.html 

lyl2: http://www.nccet.snre.umich.edu/EndSpp/Optilcarn.html 

lyl3: http://nccet.snre.umich.edu/EndSpp/Optileam.htm] 

lyl 4: http ://hub.terc.edu/ra/m5/cd-reform/postings/05 1 4.html 

lyl5: http://prism.prs.kl2.nj.us/WWW/OII/disc-pub/novae-group/0023.html 

lyl 6: http://www.sugomusic.com/marketspace_dir/mindcx.html 

tyl7: http://detnews.com/menu/Datil231.htm 

Iyl8: http://www-leland.stanford.edu/group/itss/Org/Digest/DLgest_Jan5_96.html 

exOl : http://home.forbin.com/~gwinslow/travel.html 

ex02: http://www.oisc.on.ca/--lasullivan/butterflics.html 

ex03: http://www.tnc.org. /infield/species/monarch/monarch, htm 

ex04: http://www,deepriver.com/adven/htm/]81.htm 

ex05: http;//hermes.ecn.purdue.cdu:8001/gopher_dir/The%20Purdue%20Cooperative% 
20 Extension %20Gopher%20Information%20Server/Cunent%20News/Archives/9 
6/Apr/04-26/OSL:%20Flies%20of%20the%20Butter%20Season 

exO 6; http ://www. museum .sta tc . il .us :70/exh ibits/sy mbols/insect . h tml 

ex07; http://monarch.bio.ukans.edu/ 

ex08: http://www.gi.net/NET/PM-1996/96-09/96-09-27/0045.html 

ex09: http ://www.whro-pbs.org/LeamingLinl^monarchs. html 

exlO: http://www.beU-atl.com/wschool/html/announce/oct/oct2096.htm 

exl 1; http://www.isit.com/butterfty/articles/sendmon.htm 

exl2; http://mgfx.com/butterfly/articles/sendmon.htm 

exl3; http://archives.gsn.org/ocL96/0042.html 

ex 14: http ;//www.dispatch.com/news/newsfeatures/butterfliesl 11 2.html 

exl5: http://www.cpb.uokhBc.edu/OKC/OKCZoo/zn/ZooNew960912a.html 

ex 1 6; http ://www.agctr. lsu.edu/wwwac/4nwsl 01 7. txt 

exl 7: http://webmill.com/pismo/monarch 

exl8: http://www.adventure.com/library/encyclopcdia/bug/rfimnarc.html 

exl9: http://genbiol.cbs.umn.edu/1009/1009h/monarchs.html 

ex20: http://www.mexico-travel.com/states/sl6/132zzl .htm 
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1862, 6384 http://www.ed.uiuc.edu/Activity-Structures/Information-ColIections/Pooled- 
Data- Anal ysis/Map-TTie-Monarchs .html 
Score: 4.50772 

1418, 1468 http://dnr.state.il.us/NREDU/CLASSRM/symbol/insect.htm 
Score: 3.3608 

759, 3239 http://www.niuseum.state.il.us:70/exhibits/symbols/inscct.html 
Score: 2.03616 

2235, 2491 http://hermes.ecn.purduc.cdu:8001/gopher_dir/The%20Purdue%20Coopcr 
ative%20Extension%20Gopher%20Information%20Server/Current%2QNe 
ws/Archives/96/ApT/04-26/OSL;%20Flies%20of%20the%20Butter%20Season 
Score: 1.50753 

1862, 1972 http://www.mid.net/NET/PM-1995/95-02/95-02-16/0017.htmI 
Score: 1.35457 

8122, 8245 http://www.oit.itd.umich.edu/bio/doc.cgi/Arthropoda/Insecta/LeLdoptera/ 
Nymphalidae/Danaus_plexippus.ftl 
Score: 1.34016 

729, 801 http://home.forbin.com/-gwinslow/travel.html 

Score: 1.20644 
2944, 3086 http:/monarch.bio.ukans.edu/snow,html 

Score: 1.03585 

30, 1360 http://www.dcepriver.com/adven/htm/181.htm 
Score: 0.892374 

6329, 6598 http://riccLnfo.rice.edu/armadillo/Ftbend/buttcrfly.html 
Score: 0.882793 

39,645 http://www.blakc.pvt.kl2.mn.us/campus/projccts/upper/monarchs/research/ 

steinerl.html 

Score: 0.851319 
876, 3326 http://www.oise.on.ca/--lasullivan/butterflies.html 

Score: 0.460421 
331, 510 http://www.stolaf.edu/other/snap/nlinsects.html 

Score: 0.46026 

324, 1225 http://www.doesgodexist.org/'JanFeb96/Monarch.html 
Score: 0.258716 
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3469,3636 


http V/www. ties.kl 2 . m n .us/-jnorth/critters/moaarch/829061 947. html 




Score: 0.224219 


32,429 


http://monarch.bio.ukans.edu/parBsitecotilroI 




Score: 0.151446 


3031, 3840 


http ;//www,tics.kl2.mn .us/~jnorth/crittcrs/moDarch/82968091 4.html 




Score: 0.0425499 



Appendix J 

STATE INSECT - MONARCH BUTTERFLY DanauS plexippus 
The monarch was chosen in 1975 to be Illinois' state insect. The third 
grade classes at Dennis School in Decatur originally recommended the 
species. 

'ITic monarch undergoes four changes in form (metamorphoses) during its 
lifetime. It begins as a tiny egg. In its second stage it becomes a 
black, yellow and white striped caterpillar (larva). During this stage, 
the monarch caterpillar sheds its skin (molts) up to four times as it 
grows to its full length of about 2 inches. The monarch larva feeds only 
on the milkweed plant. Luckily for the larvae, the juices of the 
milkweed make the larvae taste terrible to birds and they rarely get 
eaten. In the third stage, the monarch forms a protective covering 
called a chrysalis (pupa). This pupa is shiny and green with gold 
speckles. During this time the monarch undergoes its final change. When 
it emerges from its sac, out comes a beautiful black and orange monarch 
butterfly. This entire process takes about a month. There are usually 
three to four generations of monarchs produced each year. 
While most insects hibernate, the monarch is the only species of 
butterfly which actually flies to warmer weather (migrates) in winter. 
Monarchs from Illinois spend their winters in California and Mexico. In 
the fall, people have reported seeing entire trees covered with 
thousands of migrating monarchs! However, only about 1 percent of these 
monarchs actually survive the journey back to Illinois. 



Back 

Almost America's national butterfly, 

the flamboyant Monarch is among the best-known butterflies. 
Burnt orange, up to 4 inches across with black 

markings, Monarchs can be found virtually anj-where in the United States. 
Besides making the Monarch beautiful to watch, its orange color serves 
another purpose. Butterfly wings display an array of disguises to 
confuse predators. One of the most effective defenses, demonstrated so 
famously by the monarch, is the display of bright colors to signify 
distaste fulness. Monarchs favor nectar from the poisonous milkweek 
plant. That makes monarchs un tasty, and birds learn this early on, 
avoiding monarchs and other butterflies (such as viceroys) that look 
like them. 

The Monarch is the only butterfly that annually migrates both north and 
south. In late September, for instance, they begin to congregate at 
Cache River Joint Wetlands Project in Illinois, a Conservancy preserve, 
and move southward for their annual migration. 
By October, they've flown hundreds of miles. 
Millions of Monarch butterflies return to their 
winter habitat in Mexico via the Devils River Corridor, which flows 
through the heart of Dolan Ranch Preserve in Texas. Monarchs continue 
south to the Sierra Madre of Mexico, where they spend the winter. By 
January, the Mexican fir trees and mountainsides arc full of Monarchs, 
drifting, gliding, fluttering and basking. 

But no individual makes the entire round- trip journey. As they head 
north in spring, Monarchs breed along the way and their ofiEspring return 
to the starting poinL Still the Monarch is among the longest-lived 
butterflies, lasting about 10 months between chrysalis and the day it 
dies. 

Photo Credits 

Monarch Butterfly (c) Terry Cook 
Copyright - 1996, The Nature Conservancy. 



APPENDIX K 

15 Result #2 

Ranking: 3.3608 
From: dnr.state.il.us 
Found with: Alta Vista 

While most insects hibernate, the monarch is the only species of 
2Q butterfly which actually flies to warmer weather (migrates) in winter. 
Monarchs from Illinois spend their winters in California and Mexico. In 
the fall, people have reported seeing entire trees covered with 
thousands of migrating monarchs! However, only about 1 percent of these 
monarchs actually survive the journey back to Illinois. 
Result #3 

25 Ranking: 2.47543 
From: www.tnc.org. 
Found with: Excite 

The Monarch is the only butterfly that annually migrates both north and 
south. In late September, for instance, they begin to congregate at 
Cache River Joint Wetlands Project in Illinois, a Conservancy preserve, 

30 and move southward for their annual migration. By October, they've 
flown hundreds of miles. Millions of Monarch butterflies return to their 
winter habitat in Mexico via the Devils River Corridor, which flows 
through the heart of Dolan Ranch Preserve in Texas. Monarchs continue 
south to the Sierra Madre of Mexico, where they spend the winter. By 
January, the Mexican fir trees and mountainsides are full of Monarchs, 

35 drifting, gliding, fluttering and basking. 



We claim: 

1. A method of locating information in at least one 
40 information source using at least one search engine, com- 
prising the steps of: 

(i) accepting a natural language query describing desired 
information; 

45 (ii) parsing said natural language query to extract terms 
relevant to said desired information; 

(iii) creating search data comprising at least two search 
candidates from said extracted terms in a form appro- 
priate to each of said at least one search engine, and 
transferring said created search data to said each of said 
at least one search engine to initiating a search; 

(iv) receiving search results comprising at least one list of 
information sources from said each of said at least one 
search engine, and removing redundancies from said at 
least one list of information sources to obtain a reduced 
list of information sources; 

(v) retrieving complete copies of each information source 
in said reduced list; 

60 (vi) examining each said retrieved complete copy relative 
to said at least two search candidates to determine a 
match ranking therefor by: 

(a) arranging each said complete copy into segments, 
each segment defining the contents of said document 
65 between at least three consecutive matches between 

said complete copy and any of said at least two 
search candidates; 
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(b) examining each segment in said complete copy to 4. The method of claim 1 wherein step (vii), said complete 
determine a segment score comprising a score for segments comprise at least one complete sentence. 

each match between the contents of said complete 5 method of claim 1 where in step (vii), said 

copy and each search candidate, and weighting said j^^^ ^^^^^ j^^^ ^^pl^t^ 

segment score with respect to the length of said ^ 

segment; ^^^^ ' ... 1 

(c) selecting at least two segments of said complete 6, The method of claim 1 wherein said at least one 
copy with the highest weighted segment scores from information source comprises an HTML formatted docu- 
step (b); ment. 

(d) for each selected segment, augmenting the segment 10 7. The method of claim 6 wherein said at least one 
to include the contents of said complete copy information source is accessible via a telecommunications 
between the selected segment and an adjacent match network 

and performing step (b) for each augmented segment o™.' , . . , ■ ~ , - . t . u 

to obtain an updat^ segment score; «■ ^^^hod of claim 7 wherein said at least one search 

(e) while said updated segment score for an augmented 15 engme is accessible via a telecommunications network, 
segment is greater than said segment store, perform- 9. The method of claim 6 wherein at least first and second 
ing step (d); retrieved complete copies are obtained, said first complete 

(f) selecting said augmented segment with the highest retrieved copy being obtained from a first location connected 
updated segment score from each said complete to said telecommunications network and said second com- 
copy; and 20 pj^jg retrieved copy being obtained from a second location 

(g) ranking the selected augmented segments for each connected to said telecommunications network, 
said complete copy according to said updated seg- ^^^^^^ ^ ^^^^^ 1^^^ 

ment scores, second search engines are employed, said first search engine 

(vii) selecting at least the highest ranked selected aug- ^. j^^^^^^ ^ ^^^^^^^ connected to said telecom- 

mented segment for display to the user, and editing f 1 _ 1 u 

...M t u;„uJ, rLu^A ..i..t.H c.n^.nt mumcations network and said second search engine being 



located at a second location connected to said telecommu- 



each said at least highest ranked selected segment to 
form a complete segment by examining the beginning 

and end of said segment and adding or removing nications network. 

adjacent content of said complete copy to form a 11. The method of claim 6 wherein said at least one 

substantially grammatically correct segment; and information source is located on a computer network, 

(viii) providing said each said substantially grammaticaUy 12. The method of claim 11, further comprising a plurality 

correct segment to said user. ' of information sources, each information source being 

2. The method as defined in claim 1 wherein step (vii) located on said computer network. 

further comprises the step of removing predefined format- 13, The method of claim 11 wherein said at least one 

ting information from said segment. search engine is located on said computer network. 

3. The method as claimed in claim 2 wherein said 

formatting information comprises HTML tags. ***** 
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